Name: matsunya/dpo_qwen_cot_merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: matsunya

Model Overview

matsunya/dpo_qwen_cot_merged is a 4 billion parameter language model, derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library, resulting in a merged 16-bit weight model that requires no adapter loading.

Key Optimizations

Enhanced Reasoning: The primary objective of this DPO fine-tuning was to improve the model's reasoning abilities, particularly in generating Chain-of-Thought (CoT) responses.
Structured Output Quality: Optimization also focused on aligning the model's outputs with preferred responses to enhance the overall quality and structure of its generated text.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 3e-05 and a beta value of 0.1. It utilized a maximum sequence length of 2048 tokens. The training data for preference optimization was sourced from the u-10bei/dpo-dataset-qwen-cot dataset.

Usage Considerations

As a fully merged model, it can be directly integrated and used with the transformers library. Users must adhere to the MIT License of the training data and the original base model's license terms.

Overview

Model Overview

Key Optimizations

Training Details

Usage Considerations

Full Model Card (README)