hiro7ka/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The hiro7ka/dpo-qwen-cot-merged model is a 4 billion parameter Qwen3-based instruction-tuned causal language model, fine-tuned by hiro7ka using Direct Preference Optimization (DPO) via Unsloth. Optimized for improved reasoning (Chain-of-Thought) and structured response quality, it leverages a 32768-token context length. This model is designed to provide aligned and coherent outputs, making it suitable for tasks requiring robust logical progression and well-structured answers.
Loading preview...