Name: moushi21/dpo-qwen-cot-merged2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: moushi21

Overview

moushi21/dpo-qwen-cot-merged2 is a 4 billion parameter language model derived from unsloth/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its LoRA adapters merged into the base model for direct use without additional loading.

Key Capabilities

Enhanced Reasoning: Optimized specifically to improve Chain-of-Thought (CoT) reasoning, enabling more logical and step-by-step problem-solving.
Structured Response Quality: Focuses on generating higher quality and more aligned outputs based on preference data.
Direct Usage: Provided as a full-merged 16-bit model, allowing straightforward integration with the transformers library.

Good For

Applications requiring improved logical reasoning and problem-solving.
Generating structured and coherent text outputs.
Tasks where alignment with preferred response styles is crucial.
Developers seeking a 4B parameter model with enhanced CoT capabilities for efficient deployment.