naru0411/LLM-competition-SFT-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 5, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

naru0411/LLM-competition-SFT-DPO is a 4 billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth). This adapter is specifically trained to enhance structured output accuracy for formats like JSON, YAML, XML, TOML, and CSV. It achieves this by applying loss only to the final assistant output, masking intermediate reasoning, making it suitable for applications requiring precise data formatting.

Loading preview...

Model Overview

This repository provides a LoRA adapter (4 billion parameters) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. The fine-tuning was performed using QLoRA (4-bit, Unsloth), and the repository contains only the adapter weights, requiring the base model to be loaded separately.

Key Capabilities

  • Enhanced Structured Output: The primary objective of this adapter is to significantly improve the accuracy of generating structured data formats such as JSON, YAML, XML, TOML, and CSV.
  • Targeted Loss Application: During training, loss was exclusively applied to the final assistant output, with intermediate Chain-of-Thought reasoning masked. This focuses the model's learning on producing correct structured responses.

Training Details

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit)
  • Max Sequence Length: 1024 tokens
  • Epochs: 1
  • Learning Rate: 4e-06
  • LoRA Configuration: r=64, alpha=128
  • Training Data: The adapter was trained using the u-10bei/structured_data_with_cot_dataset_512_v2 dataset, which is distributed under the MIT License.

Good For

  • Applications requiring reliable and accurate generation of structured data (e.g., API calls, data extraction, configuration files).
  • Developers looking to integrate a compact, specialized adapter for structured output tasks with a Qwen3-4B base model.