Name: sallm/dpo_qm3_3_step20_qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sallm

Overview

sallm/dpo_qm3_3_step20_qwen-cot-merged is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, specifically targeting an improvement in reasoning abilities, particularly Chain-of-Thought (CoT), and the generation of higher-quality structured responses. This model is distributed with its full 16-bit weights merged, simplifying deployment as no separate adapter loading is required.

Key Capabilities

Enhanced Reasoning: Optimized for better logical progression and Chain-of-Thought capabilities.
Improved Structured Responses: Designed to produce more coherent and well-formed structured outputs.
Simplified Deployment: Provided as a fully merged model, ready for direct use with transformers without LoRA adapter management.

Good for

Applications requiring robust reasoning and problem-solving.
Tasks where structured and logically sound outputs are critical.
Developers seeking a Qwen3-4B variant with enhanced CoT and response quality through DPO.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)