Ryu19940329/dpo-qwen-cot-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 25, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Ryu19940329/dpo-qwen-cot-merged is a 4-billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth). This adapter is specifically trained to enhance structured output accuracy for formats like JSON, YAML, XML, TOML, and CSV. It applies loss only to the final assistant output, masking intermediate Chain-of-Thought reasoning to optimize for direct structured responses.

Loading preview...