Name: takayosh/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: takayosh

Overview

This model, takayosh/dpo-qwen-cot-merged, is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Instruct-2507 base. It has been fine-tuned using Direct Preference Optimization (DPO), leveraging the Unsloth library to align its outputs with preferred responses. The fine-tuning process focused on enhancing the model's reasoning abilities (Chain-of-Thought) and its capacity to generate structured, high-quality responses.

Key Characteristics

Base Model: Qwen/Qwen3-4B-Instruct-2507.
Fine-tuning Method: Direct Preference Optimization (DPO) for improved alignment.
Optimization Focus: Enhanced reasoning (Chain-of-Thought) and structured output quality.
Architecture: Full-merged 16-bit weights, eliminating the need for adapter loading.
Training Data: Utilized the u-10bei/dpo-dataset-qwen-cot dataset.
Context Length: Supports a maximum sequence length of 1024 tokens during training, with a base context length of 40960 tokens.

Good For

Applications requiring models with improved logical reasoning and step-by-step thought processes.
Scenarios where structured and coherent responses are critical.
Developers looking for a Qwen3-based model with enhanced alignment to preferred output styles.

Usage

The model can be directly integrated into projects using the transformers library, as it provides full-merged weights.

Overview

Overview

Key Characteristics

Good For

Usage

Full Model Card (README)