Name: AshleyQu0311/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AshleyQu0311

Overview

AshleyQu0311/dpo-qwen-cot-merged is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for tasks requiring multi-step logical deduction.
Structured Response Quality: Aligned to produce preferred outputs and higher quality structured responses based on its DPO training.
Efficient Deployment: Provided as a fully merged model, simplifying integration and usage with the transformers library.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 5e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 during training and incorporated LoRA configuration (r=8, alpha=16) which was subsequently merged into the base model. The training data used was u-10bei/dpo-dataset-qwen-cot.

Licensing

This model is released under the MIT License, consistent with its training dataset. Users must also adhere to the license terms of the original Qwen3 base model.

Overview

Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)