CriteriaPO/qwen2.5-3b-dpo-mini
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 4, 2025Architecture:Transformer Warm

CriteriaPO/qwen2.5-3b-dpo-mini is a 3 billion parameter language model fine-tuned by CriteriaPO using Direct Preference Optimization (DPO). This model is based on CriteriaPO/qwen2.5-3b-sft-10 and is optimized for generating responses aligned with human preferences. It is suitable for conversational AI and instruction-following tasks where preferred outputs are critical.

Loading preview...