Name: Nao-Taka/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Nao-Taka

Overview

Nao-Taka/dpo-qwen-cot-merged is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its 16-bit weights fully merged, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more logical and structured outputs.
Preference Alignment: Utilizes DPO to align model responses with preferred outputs, based on a specific preference dataset.
Structured Response Quality: Focuses on generating higher quality, structured responses.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 5e-06 and a beta value of 0.1. It was trained with a maximum sequence length of 1024, using a LoRA configuration (r=8, alpha=16) that has since been merged into the base model. The training data used for DPO was u-10bei/dpo-dataset-qwen-cot.

Licensing

This model operates under the MIT License, as per the terms of its training dataset. Users are also required to comply with the original base model's license terms.

Overview

Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)