Name: sei0621/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sei0621

Model Overview

This model, sei0621/dpo-qwen-cot-merged, is a 4 billion parameter language model based on Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library to enhance its performance.

Key Capabilities

Improved Reasoning: Optimized to enhance Chain-of-Thought (CoT) reasoning, allowing for more logical and structured problem-solving.
Enhanced Response Quality: DPO training aligns the model's outputs with preferred responses, leading to higher quality and more structured generations.
Direct Use: Provided as a full-merged 16-bit weight model, eliminating the need for adapter loading and simplifying deployment with transformers.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 5e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 and was trained on the u-10bei/dpo-dataset-qwen-cot preference dataset. The base model's context length is 32768 tokens.

Usage Considerations

This model is suitable for applications where improved reasoning, structured output, and alignment with specific response preferences are critical. Users should be aware that the model's license follows the MIT License, as per the training dataset terms, and compliance with the original base model's license is also required.

Overview

Model Overview

Key Capabilities

Training Details

Usage Considerations

Full Model Card (README)