Name: motobrew/qwen-dpo-v3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: motobrew

Model Overview

motobrew/qwen-dpo-v3 is a specialized language model developed by motobrew, built upon the motobrew/qwen3-adv-comp-v34 base model. It leverages Direct Preference Optimization (DPO), implemented via the Unsloth library, to align its responses with preferred outputs.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, leading to more coherent and logical multi-step responses.
Structured Output Quality: Fine-tuned to produce higher quality structured responses, making it suitable for tasks requiring specific formats or organized information.
Preference Alignment: Trained with a DPO objective to better match desired output characteristics based on a preference dataset.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 2e-06 and a beta value of 0.02. It was configured with a maximum sequence length of 1024 tokens. The training data used was motobrew/alf-dpo-from-top-alf93-v0.

Usage Considerations

This model is intended for use with the transformers library. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license terms is required.

Overview

Model Overview

Key Capabilities

Training Details

Usage Considerations

Full Model Card (README)