CriteriaPO/qwen2.5-3b-dpo-coarse is a 3.1 billion parameter language model fine-tuned from CriteriaPO/qwen2.5-3b-sft-10. This model utilizes Direct Preference Optimization (DPO) for training, enhancing its ability to align with human preferences. It is designed for general text generation tasks, building upon the Qwen2.5 architecture with a 32768 token context length.
No reviews yet. Be the first to review!