Name: yukiakari/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yukiakari

Model Overview

yukiakari/dpo-qwen-cot-merged is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its 16-bit weights fully merged for direct use without adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for complex problem-solving tasks.
Structured Responses: Focuses on generating higher quality, more structured outputs based on preference datasets.
Efficient Deployment: Provided as a full-merged model, simplifying integration into existing transformers workflows.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-05 and a beta value of 0.1. It utilized a maximum sequence length of 1024 and incorporated LoRA configuration (r=8, alpha=16) which was subsequently merged into the base model. The training data used was u-10bei/dpo-dataset-qwen-cot.

Usage Considerations

This model is licensed under the MIT License, aligning with its training dataset. Users must also adhere to the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Usage Considerations

Full Model Card (README)