Name: Tamata1208/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Tamata1208

Model Overview

Tamata1208/dpo-qwen-cot-merged is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base model. It has been further fine-tuned using Direct Preference Optimization (DPO), leveraging the Unsloth library to enhance its performance.

Key Capabilities

Enhanced Reasoning (Chain-of-Thought): The model is specifically optimized to improve its ability to generate detailed, step-by-step reasoning processes, making it suitable for complex problem-solving.
Improved Structured Responses: Through DPO training, the model aligns its outputs with preferred formats, leading to higher quality and more consistent structured responses.
Direct Use: This repository provides the full-merged 16-bit weights, meaning no adapter loading is required for deployment, simplifying integration into existing workflows.

Training Details

The model underwent 2 epochs of DPO training with a learning rate of 5e-06 and a beta value of 0.1. It utilized a maximum sequence length of 2048 tokens and incorporated LoRA (r=8, alpha=16) which has been merged into the base model. The training data used for DPO was [u-10bei/dpo-dataset-qwen-cot].

Licensing

This model is released under the MIT License, consistent with the terms of its training dataset. Users must also adhere to the license terms of the original Qwen base model.

Overview

Model Overview

Key Capabilities

Training Details

Licensing

Full Model Card (README)