naru0411/LLM-competition-SFT-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 5, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

naru0411/LLM-competition-SFT-DPO is a 4 billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth). This adapter is specifically trained to enhance structured output accuracy for formats like JSON, YAML, XML, TOML, and CSV. It achieves this by applying loss only to the final assistant output, masking intermediate reasoning, making it suitable for applications requiring precise data formatting.

Loading preview...