Overview
This model, deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library to significantly improve its ability to generate stable and schema-compliant structured responses.
Key Capabilities
- Enhanced Structured Output: Optimized specifically for generating responses that adhere to predefined schemas and maintain structural integrity.
- DPO Fine-tuning: Utilizes Direct Preference Optimization to align model outputs with desired structured formats based on a preference dataset.
- Merged Weights: Provided as full-merged 16-bit weights, simplifying deployment as no adapter loading is required.
- Base Model: Built upon the robust Qwen3-4B-Instruct architecture.
Training Details
The model underwent 1 epoch of DPO training with a learning rate of 2e-07 and a beta value of 0.1. It was trained with a maximum sequence length of 2048 tokens, using a LoRA configuration of r=32, alpha=64, which has been merged into the base model. The training data, u-10bei/structured_data_with_cot_dataset_512_v2, is licensed under the MIT License.
Ideal Use Cases
This model is particularly well-suited for applications where reliable and consistent structured data output is critical. Consider using this model for:
- Generating JSON, XML, or other structured data formats from natural language prompts.
- Tasks requiring strict adherence to output schemas.
- Automated data extraction and formatting where output stability is paramount.