Chufeng-Jiang/Qwen2.5-1.5B-HumanPreference-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 7, 2026Architecture:Transformer Loading
Chufeng-Jiang/Qwen2.5-1.5B-HumanPreference-DPO is a 1.5 billion parameter language model based on the Qwen2.5 architecture, fine-tuned using Direct Preference Optimization (DPO). This model is designed to align with human preferences, making it suitable for tasks requiring nuanced understanding and generation of human-like responses. Its DPO training aims to enhance its conversational quality and adherence to desired output styles.
Loading preview...