sonodd/qwen3-4b-structeval-dpo-v2-sft-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 22, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The sonodd/qwen3-4b-structeval-dpo-v2-sft-merged model is a 4 billion parameter Qwen3-based language model fine-tuned using Direct Preference Optimization (DPO). It is specifically optimized to enhance the quality of structured outputs such as JSON, YAML, XML, TOML, and CSV. This model builds upon a previously fine-tuned version, sonodd/qwen3-4b-structeval-sft-v4-lr2e5-merged, and is designed for applications requiring precise and well-formatted data generation.

Loading preview...

Model Overview

This model, sonodd/qwen3-4b-structeval-dpo-v2-sft-merged, is a 4 billion parameter language model based on the Qwen3 architecture. It has been further fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon a prior Supervised Fine-Tuning (SFT) phase. The primary objective of this DPO fine-tuning was to significantly improve the model's ability to generate high-quality structured outputs.

Key Capabilities

  • Enhanced Structured Output: Specifically optimized for generating accurate and well-formatted structured data, including JSON, YAML, XML, TOML, and CSV.
  • DPO Fine-tuning: Leverages Direct Preference Optimization to align responses with preferred output formats, improving consistency and correctness.
  • Merged Weights: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024. The training data utilized was the u-10bei/dpo-dataset-qwen-cot dataset. The model is released under the MIT License, consistent with its training data.

When to Use This Model

This model is particularly well-suited for applications where the generation of precise and syntactically correct structured data is critical. Consider using it for tasks such as:

  • Generating API responses in JSON format.
  • Creating configuration files in YAML or TOML.
  • Extracting structured information into CSV or XML.
  • Any scenario requiring reliable, formatted text output from an LLM.