Takashi-0000/dpo-qwen-cot-merged0
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
Takashi-0000/dpo-qwen-cot-merged0 is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It utilizes Direct Preference Optimization (DPO) to enhance reasoning capabilities through Chain-of-Thought (CoT) and improve structured response quality. This model is optimized for generating aligned and coherent outputs, making it suitable for tasks requiring improved logical flow and structured answers.
Loading preview...