deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged
The deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged model is a 4 billion parameter Qwen3-based language model, fine-tuned by deepkick using Direct Preference Optimization (DPO) via Unsloth. It is specifically optimized to enhance structured response stability and schema adherence, making it suitable for applications requiring precise output formats. This model features full-merged 16-bit weights and supports a maximum sequence length of 2048 tokens, focusing on reliable structured data generation.
Loading preview...
Overview
This model, deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library to significantly improve its ability to generate stable and schema-compliant structured responses.
Key Capabilities
- Enhanced Structured Output: Optimized specifically for generating responses that adhere to predefined schemas and maintain structural integrity.
- DPO Fine-tuning: Utilizes Direct Preference Optimization to align model outputs with desired structured formats based on a preference dataset.
- Merged Weights: Provided as full-merged 16-bit weights, simplifying deployment as no adapter loading is required.
- Base Model: Built upon the robust Qwen3-4B-Instruct architecture.
Training Details
The model underwent 1 epoch of DPO training with a learning rate of 2e-07 and a beta value of 0.1. It was trained with a maximum sequence length of 2048 tokens, using a LoRA configuration of r=32, alpha=64, which has been merged into the base model. The training data, u-10bei/structured_data_with_cot_dataset_512_v2, is licensed under the MIT License.
Ideal Use Cases
This model is particularly well-suited for applications where reliable and consistent structured data output is critical. Consider using this model for:
- Generating JSON, XML, or other structured data formats from natural language prompts.
- Tasks requiring strict adherence to output schemas.
- Automated data extraction and formatting where output stability is paramount.