Name: sonodd/qwen3-4b-structeval-dpo-v2-sft-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sonodd

Model Overview

This model, sonodd/qwen3-4b-structeval-dpo-v2-sft-merged, is a 4 billion parameter language model based on the Qwen3 architecture. It has been further fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon a prior Supervised Fine-Tuning (SFT) phase. The primary objective of this DPO fine-tuning was to significantly improve the model's ability to generate high-quality structured outputs.

Key Capabilities

Enhanced Structured Output: Specifically optimized for generating accurate and well-formatted structured data, including JSON, YAML, XML, TOML, and CSV.
DPO Fine-tuning: Leverages Direct Preference Optimization to align responses with preferred output formats, improving consistency and correctness.
Merged Weights: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024. The training data utilized was the u-10bei/dpo-dataset-qwen-cot dataset. The model is released under the MIT License, consistent with its training data.

When to Use This Model

This model is particularly well-suited for applications where the generation of precise and syntactically correct structured data is critical. Consider using it for tasks such as:

Generating API responses in JSON format.
Creating configuration files in YAML or TOML.
Extracting structured information into CSV or XML.
Any scenario requiring reliable, formatted text output from an LLM.

Overview

Model Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)