Name: motobrew/qwen-dpo-v66 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: motobrew

Overview

motobrew/qwen-dpo-v66 is a 4 billion parameter language model developed by motobrew, built upon the motobrew/qwen3-adv-comp-v34 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library to enhance its response quality and alignment with desired outputs.

Key Capabilities

Improved Reasoning: Optimized to enhance Chain-of-Thought reasoning, allowing for more logical and coherent multi-step problem-solving.
Structured Response Generation: Fine-tuned to produce higher quality, structured outputs based on preference datasets.
Preference Alignment: Utilizes DPO to align model behavior with preferred human feedback, leading to more desirable and useful responses.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 2e-06 and a beta value of 0.01. It was trained with a maximum sequence length of 2048 tokens, using the motobrew/alf-dpo-from-top-alf93-v0 dataset for preference optimization.

Good For

Applications requiring enhanced reasoning abilities.
Scenarios where structured and aligned responses are critical.
Tasks benefiting from models optimized through direct preference learning.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)