CriteriaPO/qwen2.5-3b-dpo-finegrained
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 4, 2025Architecture:Transformer Warm

CriteriaPO/qwen2.5-3b-dpo-finegrained is a 3.1 billion parameter language model, fine-tuned by CriteriaPO using Direct Preference Optimization (DPO) on top of the Qwen2.5-3B-SFT-10 base model. This model is designed to generate high-quality, preference-aligned text, leveraging its 32K token context length for nuanced responses. Its primary strength lies in producing outputs that align with human preferences, making it suitable for conversational AI and instruction-following tasks.

Loading preview...