takeshi200ok/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The takeshi200ok/dpo-qwen-cot-merged model is a 4 billion parameter Qwen3-based causal language model developed by takeshi200ok. It has been fine-tuned using Direct Preference Optimization (DPO) to enhance reasoning capabilities, specifically Chain-of-Thought (CoT), and improve structured response quality. This fully merged 16-bit model is optimized for generating aligned and coherent outputs in reasoning-intensive tasks.

Loading preview...