yukiakari/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The yukiakari/dpo-qwen-cot-merged model is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. Utilizing Direct Preference Optimization (DPO) with the Unsloth library, it is specifically optimized for enhancing reasoning capabilities through Chain-of-Thought (CoT) and improving structured response quality. This model is designed for tasks requiring robust logical inference and well-organized outputs.
Loading preview...