Name: sho-nakamura/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sho-nakamura

Overview

This model, sho-nakamura/dpo-qwen-cot-merged, is a 4 billion parameter variant of the Qwen3 architecture, developed by sho-nakamura. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon the sho-nakamura/qwen3-4b-instruct-sft-lora-structured base model. The DPO training focused on aligning the model's responses with preferred outputs, specifically enhancing its reasoning (Chain-of-Thought) and structured response generation capabilities.

Key Capabilities

Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) prompting to improve logical deduction.
Structured Output: Excels at generating responses in a structured format, based on preference datasets.
Full-Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading.
Qwen3 Base: Leverages the robust foundation of the Qwen3-4B-Instruct model.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 during training. The LoRA configuration (r=8, alpha=16) was merged into the base model.

Good For

Applications requiring strong reasoning abilities.
Tasks where structured and formatted outputs are crucial.
Developers looking for a Qwen3-based model with improved CoT and structured response generation.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)