Ryu19940329/dpo-qwen-cot-merged
Ryu19940329/dpo-qwen-cot-merged is a 4-billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth). This adapter is specifically trained to enhance structured output accuracy for formats like JSON, YAML, XML, TOML, and CSV. It applies loss only to the final assistant output, masking intermediate Chain-of-Thought reasoning to optimize for direct structured responses.
Loading preview...
Overview
This repository provides a LoRA adapter developed by Ryu19940329, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. The fine-tuning was performed using QLoRA (4-bit, Unsloth), and the repository contains only the adapter weights, requiring the base model to be loaded separately.
Key Capabilities
- Enhanced Structured Output: The primary objective of this adapter is to significantly improve the accuracy of structured outputs, supporting formats such as JSON, YAML, XML, TOML, and CSV.
- Targeted Loss Application: During training, loss is exclusively applied to the final assistant output, with intermediate Chain-of-Thought reasoning being masked. This approach focuses the model's learning on generating precise structured responses.
Training Details
The adapter was trained on the u-10bei/structured_data_with_cot_dataset_512_v2 dataset, which is distributed under the MIT License. Key training configurations include a maximum sequence length of 512, 1 epoch, a learning rate of 3e-06, and LoRA parameters of r=64, alpha=128.
Good For
- Applications requiring high accuracy in structured data generation.
- Tasks involving the creation of JSON, YAML, XML, TOML, or CSV outputs from natural language prompts.
- Developers looking to integrate a specialized model for structured data extraction or generation without the overhead of a full model fine-tune.