Name: motobrew/qwen-dpo-v13 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: motobrew

Overview

motobrew/qwen-dpo-v13 is a 4 billion parameter language model developed by motobrew, fine-tuned from its predecessor, motobrew/qwen-dpo-v3. This iteration leverages Direct Preference Optimization (DPO), implemented using the Unsloth library, to align its responses more closely with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, enabling more structured and logical response generation.
Improved Response Quality: Focuses on delivering higher quality and more aligned structured responses, based on the preference dataset used during training.
DPO Fine-tuning: Utilizes DPO with a beta of 0.05 and a learning rate of 2e-06 over 1 epoch, with a maximum sequence length of 1024.

Training and Licensing

The model was trained on the motobrew/alf-dpo-from-top-alf93-v0 dataset. It is released under the MIT License, consistent with its training data terms. Users are advised to also comply with the original base model's license terms.

Good For

Applications requiring models with strong reasoning abilities.
Use cases where high-quality, structured, and preference-aligned outputs are critical.
Developers looking for a DPO-optimized model for specific conversational or generative tasks.

Overview

Overview

Key Capabilities

Training and Licensing

Good For

Full Model Card (README)