moushi21/dpo-qwen-cot-merged20
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The moushi21/dpo-qwen-cot-merged20 is a 4 billion parameter Qwen3-based causal language model, fine-tuned using a four-stage iterative SFT and DPO process. Developed by moushi21, it is specifically optimized for structured data reasoning and Chain-of-Thought (CoT) generation, excelling in tasks requiring complex data format adherence and consistent, high-fidelity outputs. This model is designed for structural evaluation (StructEval-T) with a context length of 32768 tokens.

Loading preview...