Name: Ryu19940329/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Ryu19940329

Overview

This repository provides a LoRA adapter developed by Ryu19940329, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. The fine-tuning was performed using QLoRA (4-bit, Unsloth), and the repository contains only the adapter weights, requiring the base model to be loaded separately.

Key Capabilities

Enhanced Structured Output: The primary objective of this adapter is to significantly improve the accuracy of structured outputs, supporting formats such as JSON, YAML, XML, TOML, and CSV.
Targeted Loss Application: During training, loss is exclusively applied to the final assistant output, with intermediate Chain-of-Thought reasoning being masked. This approach focuses the model's learning on generating precise structured responses.

Training Details

The adapter was trained on the u-10bei/structured_data_with_cot_dataset_512_v2 dataset, which is distributed under the MIT License. Key training configurations include a maximum sequence length of 512, 1 epoch, a learning rate of 3e-06, and LoRA parameters of r=64, alpha=128.

Good For

Applications requiring high accuracy in structured data generation.
Tasks involving the creation of JSON, YAML, XML, TOML, or CSV outputs from natural language prompts.
Developers looking to integrate a specialized model for structured data extraction or generation without the overhead of a full model fine-tune.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)