Name: deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: deepkick

Overview

This model, deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library to significantly improve its ability to generate stable and schema-compliant structured responses.

Key Capabilities

Enhanced Structured Output: Optimized specifically for generating responses that adhere to predefined schemas and maintain structural integrity.
DPO Fine-tuning: Utilizes Direct Preference Optimization to align model outputs with desired structured formats based on a preference dataset.
Merged Weights: Provided as full-merged 16-bit weights, simplifying deployment as no adapter loading is required.
Base Model: Built upon the robust Qwen3-4B-Instruct architecture.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 2e-07 and a beta value of 0.1. It was trained with a maximum sequence length of 2048 tokens, using a LoRA configuration of r=32, alpha=64, which has been merged into the base model. The training data, u-10bei/structured_data_with_cot_dataset_512_v2, is licensed under the MIT License.

Ideal Use Cases

This model is particularly well-suited for applications where reliable and consistent structured data output is critical. Consider using this model for:

Generating JSON, XML, or other structured data formats from natural language prompts.
Tasks requiring strict adherence to output schemas.
Automated data extraction and formatting where output stability is paramount.

Overview

Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)