deepkick/qwen3-4b-struct-dpo-v05-merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The deepkick/qwen3-4b-struct-dpo-v05-merged is a 4 billion parameter language model based on Qwen/Qwen3-4B-Instruct-2507, fine-tuned using Direct Preference Optimization (DPO). It is specifically optimized for enhanced structured response stability and adherence to schema, making it suitable for applications requiring consistent output formats. This model features a 40960 token context length and provides full-merged 16-bit weights for direct use.

Loading preview...

Model Overview

The deepkick/qwen3-4b-struct-dpo-v05-merged is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, focusing on aligning its responses with preferred outputs.

Key Capabilities

  • Enhanced Structured Output: The primary optimization goal was to improve the model's stability in generating structured responses and its adherence to specified schemas. This makes it particularly effective for tasks requiring consistent data formats.
  • DPO Fine-tuning: Leverages Direct Preference Optimization to align model behavior with desired output characteristics, based on a preference dataset.
  • Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.
  • Base Model: Built upon the Qwen3-4B-Instruct architecture, inheriting its general language understanding and generation capabilities.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.05, using a maximum sequence length of 768. The training data utilized was u-10bei/dpo-dataset-qwen-cot.

Good For

  • Applications requiring reliable and consistent structured data output.
  • Tasks where adherence to specific JSON or other schema formats is critical.
  • Developers looking for a Qwen3-based model with improved control over output structure.