KhaledScience/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 18, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
KhaledScience/dpo-qwen-cot-merged is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by KhaledScience. Fine-tuned using Direct Preference Optimization (DPO) with a focus on Chain-of-Thought (CoT) reasoning, it excels at generating structured and aligned responses. This model is optimized for improving reasoning capabilities and overall response quality.
Loading preview...