Name: KawausoHiroKawauso/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: KawausoHiroKawauso

Model Overview

This model, KawausoHiroKawauso/dpo-qwen-cot-merged, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its 16-bit weights fully merged into the base model, eliminating the need for adapter loading.

Key Optimizations

The primary objective of this DPO fine-tuning was to enhance the model's ability to generate improved reasoning (Chain-of-Thought) and produce high-quality structured responses. This was achieved by aligning the model's outputs with a specific preference dataset (u-10bei/dpo-dataset-qwen-cot).

Training Configuration

Base Model: Qwen/Qwen3-4B-Instruct-2507
Method: Direct Preference Optimization (DPO)
Epochs: 1
Learning Rate: 1e-05
Max Sequence Length: 1024

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Enhanced Reasoning: Tasks that benefit from explicit, step-by-step logical deductions.
Structured Output Generation: Scenarios where responses need to adhere to specific formats or structures.
Preference Alignment: Use cases where model outputs should closely match human-preferred examples for quality and coherence.

Overview

Model Overview

Key Optimizations

Training Configuration

Ideal Use Cases

Full Model Card (README)