CriteriaPO/qwen2.5-3b-dpo-coarse
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 4, 2025Architecture:Transformer Warm

CriteriaPO/qwen2.5-3b-dpo-coarse is a 3.1 billion parameter language model fine-tuned from CriteriaPO/qwen2.5-3b-sft-10. This model utilizes Direct Preference Optimization (DPO) for training, enhancing its ability to align with human preferences. It is designed for general text generation tasks, building upon the Qwen2.5 architecture with a 32768 token context length.

Loading preview...