Name: hallomee/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hallomee

Model Overview

The hallomee/dpo-qwen-cot-merged is a 4 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO), leveraging the Unsloth library for efficient training. This model incorporates a merged 16-bit weight configuration, eliminating the need for separate adapter loading.

Key Capabilities

Enhanced Reasoning (Chain-of-Thought): Optimized specifically to improve the model's ability to generate logical, step-by-step reasoning processes.
Improved Structured Responses: Focuses on delivering higher quality and more coherent structured outputs.
DPO Fine-tuning: Benefits from preference-based learning to align responses with desired output characteristics.

Good For

Applications requiring robust reasoning abilities.
Tasks where structured and high-quality output formatting is crucial.
Scenarios benefiting from models fine-tuned with Direct Preference Optimization for better alignment.