motobrew/qwen-dpo-v66
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

motobrew/qwen-dpo-v66 is a 4 billion parameter language model developed by motobrew, fine-tuned from motobrew/qwen3-adv-comp-v34 using Direct Preference Optimization (DPO). This model is specifically optimized for aligning responses with preferred outputs, focusing on improving reasoning capabilities (Chain-of-Thought) and generating structured responses. It leverages a 32768 token context length to enhance performance in complex tasks requiring detailed understanding and generation.

Loading preview...

Overview

motobrew/qwen-dpo-v66 is a 4 billion parameter language model developed by motobrew, built upon the motobrew/qwen3-adv-comp-v34 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library to enhance its response quality and alignment with desired outputs.

Key Capabilities

  • Improved Reasoning: Optimized to enhance Chain-of-Thought reasoning, allowing for more logical and coherent multi-step problem-solving.
  • Structured Response Generation: Fine-tuned to produce higher quality, structured outputs based on preference datasets.
  • Preference Alignment: Utilizes DPO to align model behavior with preferred human feedback, leading to more desirable and useful responses.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 2e-06 and a beta value of 0.01. It was trained with a maximum sequence length of 2048 tokens, using the motobrew/alf-dpo-from-top-alf93-v0 dataset for preference optimization.

Good For

  • Applications requiring enhanced reasoning abilities.
  • Scenarios where structured and aligned responses are critical.
  • Tasks benefiting from models optimized through direct preference learning.