ryosao/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Feb 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
ryosao/dpo-qwen-cot-merged is a 4 billion parameter Qwen3-based causal language model fine-tuned by ryosao using Direct Preference Optimization (DPO). This model is specifically optimized for improving reasoning capabilities through Chain-of-Thought (CoT) and enhancing structured response quality. It is designed for tasks requiring aligned, high-quality outputs based on preferred response patterns.
Loading preview...
Model Overview
ryosao/dpo-qwen-cot-merged is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned by ryosao using Direct Preference Optimization (DPO) via the Unsloth library, with its 16-bit weights fully merged for direct use without adapter loading.
Key Capabilities
- Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for complex problem-solving and logical deduction tasks.
- Improved Response Quality: Aligned through DPO to produce higher quality and more structured outputs, based on a preference dataset (u-10bei/dpo-dataset-qwen-cot).
- Direct Usage: As a fully merged model, it can be loaded and used directly with the
transformerslibrary, simplifying deployment.
Good For
- Applications requiring models with strong reasoning abilities.
- Use cases where structured and high-quality responses are critical.
- Developers looking for a Qwen3-based model with DPO-enhanced alignment for specific output preferences.