The sonodd/qwen3-4b-structeval-dpo-v2-sft-merged model is a 4 billion parameter Qwen3-based language model fine-tuned using Direct Preference Optimization (DPO). It is specifically optimized to enhance the quality of structured outputs such as JSON, YAML, XML, TOML, and CSV. This model builds upon a previously fine-tuned version, sonodd/qwen3-4b-structeval-sft-v4-lr2e5-merged, and is designed for applications requiring precise and well-formatted data generation.
Loading preview...
Model Overview
This model, sonodd/qwen3-4b-structeval-dpo-v2-sft-merged, is a 4 billion parameter language model based on the Qwen3 architecture. It has been further fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon a prior Supervised Fine-Tuning (SFT) phase. The primary objective of this DPO fine-tuning was to significantly improve the model's ability to generate high-quality structured outputs.
Key Capabilities
- Enhanced Structured Output: Specifically optimized for generating accurate and well-formatted structured data, including JSON, YAML, XML, TOML, and CSV.
- DPO Fine-tuning: Leverages Direct Preference Optimization to align responses with preferred output formats, improving consistency and correctness.
- Merged Weights: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and simplifying deployment with
transformers.
Training Details
The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024. The training data utilized was the u-10bei/dpo-dataset-qwen-cot dataset. The model is released under the MIT License, consistent with its training data.
When to Use This Model
This model is particularly well-suited for applications where the generation of precise and syntactically correct structured data is critical. Consider using it for tasks such as:
- Generating API responses in JSON format.
- Creating configuration files in YAML or TOML.
- Extracting structured information into CSV or XML.
- Any scenario requiring reliable, formatted text output from an LLM.