chenyongxi/Qwen2.5-1.5B-SFT-DPO-InfinityPreference
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026Architecture:Transformer Cold

The chenyongxi/Qwen2.5-1.5B-SFT-DPO-InfinityPreference model is a 1.5 billion parameter language model, fine-tuned from the Qwen2.5 architecture. It was trained using Direct Preference Optimization (DPO) on the BAAI/Infinity-Preference dataset, specializing in generating responses aligned with human preferences. This model is designed for tasks requiring nuanced, preference-based text generation, offering a compact solution for DPO-tuned applications with a 32768 token context length.

Loading preview...