Name: deepkick/qwen3-4b-struct-dpo-v11-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: deepkick

Overview

This model, deepkick/qwen3-4b-struct-dpo-v11-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library to enhance its ability to produce stable and schema-compliant structured responses.

Key Capabilities

Structured Response Generation: Optimized specifically for generating outputs that adhere to predefined structures and schemas.
DPO Fine-tuning: Leverages Direct Preference Optimization for improved response quality based on preference data.
Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.
Base Model Context: Inherits the 40960 token context length from its base Qwen3-4B-Instruct model.

Training Details

The model was trained for 1 epoch with a learning rate of 2e-07 and a beta of 0.05, using a maximum sequence length of 1536. The training utilized the u-10bei/structured_data_with_cot_dataset_512_v2 dataset, which is licensed under the MIT License.

Good For

Applications requiring reliable structured data output.
Tasks where schema adherence is critical.
Developers looking for a Qwen3-based model with enhanced response format consistency.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)