Name: SunTaiyo/qwen3-4b-structured-output-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SunTaiyo

Overview

This model, SunTaiyo/qwen3-4b-structured-output-dpo, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, leading to more logical and coherent outputs.
Structured Response Quality: Specifically trained to generate high-quality structured responses, making it ideal for tasks requiring formatted or constrained output.
DPO Fine-tuning: Utilizes DPO with a beta of 0.1 and a learning rate of 1e-07, focusing on aligning model behavior with desired preferences.
Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.

Good For

Applications requiring models to follow complex reasoning steps.
Generating structured data, such as JSON, XML, or other formatted text.
Tasks where output alignment with specific preferences is critical.

Training Details

The model was trained for 1 epoch with a maximum sequence length of 1024, using the u-10bei/dpo-dataset-qwen-cot dataset. It incorporates LoRA configuration (r=8, alpha=16) which has been merged into the base model.