Name: deepkick/qwen3-4b-struct-dpo-v05-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: deepkick

Model Overview

The deepkick/qwen3-4b-struct-dpo-v05-merged is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, focusing on aligning its responses with preferred outputs.

Key Capabilities

Enhanced Structured Output: The primary optimization goal was to improve the model's stability in generating structured responses and its adherence to specified schemas. This makes it particularly effective for tasks requiring consistent data formats.
DPO Fine-tuning: Leverages Direct Preference Optimization to align model behavior with desired output characteristics, based on a preference dataset.
Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.
Base Model: Built upon the Qwen3-4B-Instruct architecture, inheriting its general language understanding and generation capabilities.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.05, using a maximum sequence length of 768. The training data utilized was u-10bei/dpo-dataset-qwen-cot.

Good For

Applications requiring reliable and consistent structured data output.
Tasks where adherence to specific JSON or other schema formats is critical.
Developers looking for a Qwen3-based model with improved control over output structure.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)