Name: tabidance/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tabidance

Model Overview

The tabidance/dpo-qwen-cot-merged model is a 4 billion parameter language model based on the Qwen/Qwen3-4B-Instruct-2507 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library, resulting in a merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, allowing for more structured and logical response generation.
Improved Structured Output: Specifically trained to align responses with preferred outputs, enhancing the quality of structured data generation.
DPO Fine-tuning: Utilizes DPO to align model behavior with human preferences, leading to more desirable and coherent outputs.
Direct Usage: As a fully merged model, it can be used directly with the transformers library without additional configuration.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024. The training data utilized was u-10bei/dpo-dataset-qwen-cot. The model's license is MIT, consistent with the dataset terms, and users must also adhere to the original base model's license.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)