harutoshi/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The harutoshi/dpo-qwen-cot-merged model is a 4 billion parameter Qwen3-4B-Instruct-2507 variant, fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library. This model specializes in enhancing reasoning capabilities through Chain-of-Thought (CoT) and improving the quality of structured responses. It is optimized for tasks requiring aligned and preferred outputs, making it suitable for applications where response coherence and logical flow are critical.
Loading preview...