Name: TakaYamamoto/dpo-qwen-cot-merged_biya API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TakaYamamoto

Model Overview

TakaYamamoto/dpo-qwen-cot-merged_biya is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, integrating the full 16-bit weights directly without requiring adapter loading. The primary objective of this optimization was to align the model's responses with preferred outputs, specifically enhancing its reasoning capabilities (Chain-of-Thought) and the overall quality of structured responses.

Key Capabilities

Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) processes, leading to improved logical progression in responses.
Structured Output Quality: Fine-tuned to produce higher quality and more structured outputs based on preference datasets.
Direct Use: As a fully merged model, it can be used directly with transformers without additional adapter loading.
Qwen3-4B Base: Leverages the robust architecture and capabilities of the Qwen3-4B-Instruct-2507 base model.

Training Details

The model underwent 3 epochs of DPO training with a learning rate of 1e-07 and a beta value of 0.05. It utilized a maximum sequence length of 4096 tokens. The training data used was [u-10bei/dpo-dataset-qwen-cot].

Good For

Applications requiring improved reasoning and logical coherence.
Tasks where structured and high-quality responses are critical.
Developers seeking a Qwen3-4B variant with enhanced alignment to preferred outputs.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)