Name: arigedon/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: arigedon

Overview

This model, arigedon/dpo-qwen-cot-merged, is a 4 billion parameter language model based on Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) reasoning, aiming for more structured and logical outputs.
Improved Response Quality: DPO training focuses on generating higher quality and more aligned responses.
Direct Usage: Provided as a full-merged 16-bit model, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

The model underwent 2 epochs of DPO training with a learning rate of 5e-07 and a beta of 0.2, using a maximum sequence length of 2024. The training utilized the u-10bei/dpo-dataset-qwen-cot dataset.

Licensing

This model operates under the MIT License, consistent with its training data. Users must also adhere to the original base model's license terms.

Overview

Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)