Shusuke07/qwen3-4b-dpo-qwen-cot-_2-3_05_DPO
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Shusuke07/qwen3-4b-dpo-qwen-cot-_2-3_05_DPO is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It utilizes Direct Preference Optimization (DPO) to enhance reasoning capabilities and structured response quality. This model is optimized for generating aligned outputs, particularly in Chain-of-Thought reasoning, making it suitable for tasks requiring improved logical coherence.

Loading preview...