tabidance/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The tabidance/dpo-qwen-cot-merged model is a 4 billion parameter Qwen3-4B-Instruct-2507 variant, fine-tuned using Direct Preference Optimization (DPO) via Unsloth. It is specifically optimized to improve reasoning capabilities through Chain-of-Thought and enhance structured response quality. This model excels in generating aligned and coherent outputs based on preferred data, making it suitable for tasks requiring precise and structured language generation.
Loading preview...