Name: tatsuji1962/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tatsuji1962

Model Overview

The tatsuji1962/dpo-qwen-cot-merged model is a specialized 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, integrating the full 16-bit weights directly without requiring adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for tasks requiring logical progression and problem-solving.
Improved Structured Responses: Fine-tuned to produce higher quality and more aligned structured outputs based on preference datasets.
Direct Usage: As a fully merged model, it can be used directly with the transformers library, simplifying deployment.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024. The training leveraged the u-10bei/dpo-dataset-qwen-cot dataset, focusing on aligning model responses with preferred outputs. The base model's license terms must be followed, and the merged model itself is released under the MIT License.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)