Name: shinich001/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shinich001

Overview

This model, shinich001/dpo-qwen-cot-merged, is a 4 billion parameter instruction-tuned variant of the Qwen3-4B-Instruct-2507 base model. It leverages Direct Preference Optimization (DPO) via the Unsloth library to align its responses with preferred outputs. The repository provides the full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, leading to more structured and logical outputs.
Improved Response Quality: Fine-tuned to produce higher quality, aligned responses based on a preference dataset.
Direct Usage: As a merged model, it can be used directly with the transformers library without additional configuration.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 and incorporated LoRA (r=8, alpha=16) which has been merged into the base weights. The training data used was u-10bei/dpo-dataset-qwen-cot.

License

This model is released under the MIT License, adhering to the terms of its training dataset. Users must also comply with the original base model's license terms.

Overview

Overview

Key Capabilities

Training Details

License

Full Model Card (README)