Itohiro2929/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 8, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Itohiro2929/dpo-qwen-cot-merged is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model utilizes Direct Preference Optimization (DPO) to enhance reasoning capabilities, specifically Chain-of-Thought (CoT), and improve the quality of structured responses. It is optimized for aligning outputs with preferred responses based on a preference dataset, making it suitable for tasks requiring improved logical flow and structured answer generation.

Loading preview...