masachika/qwen3-4b-dpo-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 23, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The masachika/qwen3-4b-dpo-cot-merged model is a 4 billion parameter Qwen3-based language model fine-tuned for improved reasoning and structured output generation. It underwent a two-stage fine-tuning process, first with Supervised Fine-Tuning (SFT) on structured output datasets (JSON, YAML, XML, TOML, CSV), followed by Direct Preference Optimization (DPO) for alignment and enhanced reasoning quality. This model is designed to provide aligned responses and generate structured data formats effectively, making it suitable for tasks requiring precise output formatting and logical coherence.
Loading preview...