Name: ryosao/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ryosao

Model Overview

ryosao/dpo-qwen-cot-merged is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned by ryosao using Direct Preference Optimization (DPO) via the Unsloth library, with its 16-bit weights fully merged for direct use without adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for complex problem-solving and logical deduction tasks.
Improved Response Quality: Aligned through DPO to produce higher quality and more structured outputs, based on a preference dataset (u-10bei/dpo-dataset-qwen-cot).
Direct Usage: As a fully merged model, it can be loaded and used directly with the transformers library, simplifying deployment.

Good For

Applications requiring models with strong reasoning abilities.
Use cases where structured and high-quality responses are critical.
Developers looking for a Qwen3-based model with DPO-enhanced alignment for specific output preferences.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)