CriteriaPO/llama3.2-3b-dpo-mini
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:May 9, 2025Architecture:Transformer Warm

The CriteriaPO/llama3.2-3b-dpo-mini model is a fine-tuned version of CriteriaPO/llama3.2-3b-sft-10, developed by CriteriaPO. This language model has been trained using Direct Preference Optimization (DPO) via the TRL framework. It is designed to generate text based on user prompts, leveraging its DPO training to align with preferred responses. This model is suitable for general text generation tasks where preference-based fine-tuning is beneficial.

Loading preview...