Name: shingo2211/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shingo2211

Model Overview

shingo2211/dpo-qwen-cot-merged is a 4 billion parameter language model based on the Qwen/Qwen3-4B-Instruct-2507 architecture. This model has undergone Direct Preference Optimization (DPO) using the Unsloth library, specifically targeting improvements in reasoning capabilities and the quality of structured responses. The DPO process aligned the model's outputs with preferred examples from the u-10bei/dpo-dataset-qwen-cot dataset, enhancing its ability to generate more logical and coherent answers.

Key Capabilities

Enhanced Reasoning (Chain-of-Thought): Optimized to produce more structured and logical thought processes in its responses.
Improved Response Quality: Fine-tuned to generate preferred and higher-quality outputs, particularly for structured tasks.
Direct Use: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and allowing direct integration with transformers.

Good For

Applications requiring models with strong reasoning and logical flow.
Use cases where structured and high-quality textual outputs are critical.
Developers seeking a Qwen3-based model with enhanced alignment to human preferences for response generation.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)