Name: masachika/qwen3-4b-dpo-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: masachika

Model Overview

The masachika/qwen3-4b-dpo-cot-merged is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base. It has been meticulously fine-tuned through a two-stage process to enhance its capabilities in generating structured outputs and improving reasoning.

Key Capabilities

Structured Output Generation: Initially fine-tuned (SFT) to produce various structured data formats, including JSON, YAML, XML, TOML, and CSV.
Improved Reasoning and Alignment: Further optimized using Direct Preference Optimization (DPO) with a specialized dataset (u-10bei/dpo-dataset-qwen-cot) to align responses with preferred outputs and boost reasoning quality.
Full-Merged Weights: This repository provides the full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.

Training Details

The model's development involved:

Stage 1 (SFT): Supervised Fine-Tuning on Qwen/Qwen3-4B-Instruct-2507 using masachika/qwen3-4b-Instruct-2507-structured-output-lora to teach structured output generation.
Stage 2 (DPO): Direct Preference Optimization on the SFT-merged model, focusing on aligning responses and improving reasoning over 2 epochs with a learning rate of 3e-07 and a max sequence length of 2048.

Good For

Applications requiring precise, structured data output (e.g., API response generation, configuration file creation).
Tasks benefiting from enhanced reasoning and aligned, high-quality responses.
Developers seeking a readily deployable 4B parameter model with specialized fine-tuning for structured generation and improved coherence.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)