Name: naru0411/LLM-competition-SFT-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: naru0411

Model Overview

This repository provides a LoRA adapter (4 billion parameters) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. The fine-tuning was performed using QLoRA (4-bit, Unsloth), and the repository contains only the adapter weights, requiring the base model to be loaded separately.

Key Capabilities

Enhanced Structured Output: The primary objective of this adapter is to significantly improve the accuracy of generating structured data formats such as JSON, YAML, XML, TOML, and CSV.
Targeted Loss Application: During training, loss was exclusively applied to the final assistant output, with intermediate Chain-of-Thought reasoning masked. This focuses the model's learning on producing correct structured responses.

Training Details

Base Model: Qwen/Qwen3-4B-Instruct-2507
Method: QLoRA (4-bit)
Max Sequence Length: 1024 tokens
Epochs: 1
Learning Rate: 4e-06
LoRA Configuration: r=64, alpha=128
Training Data: The adapter was trained using the u-10bei/structured_data_with_cot_dataset_512_v2 dataset, which is distributed under the MIT License.

Good For

Applications requiring reliable and accurate generation of structured data (e.g., API calls, data extraction, configuration files).
Developers looking to integrate a compact, specialized adapter for structured output tasks with a Qwen3-4B base model.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)