CriteriaPO/qwen2.5-3b-dpo-vanilla
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 4, 2025Architecture:Transformer Warm

CriteriaPO/qwen2.5-3b-dpo-vanilla is a 3.1 billion parameter language model fine-tuned by CriteriaPO using Direct Preference Optimization (DPO). Building upon CriteriaPO/qwen2.5-3b-sft-10, this model leverages preference data to align its outputs more closely with human preferences. With a context length of 32768 tokens, it is designed for conversational AI and instruction-following tasks where nuanced, preferred responses are critical.

Loading preview...