konghou/Qwen2.5-1.5B-DPO-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 5, 2026Architecture:Transformer Cold

The konghou/Qwen2.5-1.5B-DPO-1.5B model is a 1.5 billion parameter language model fine-tuned using Direct Preference Optimization (DPO). This model leverages the Qwen2.5 architecture and was trained on the BAAI/Infinity-Preference dataset. It is specifically optimized for generating responses aligned with human preferences, making it suitable for conversational AI and instruction-following tasks.

Loading preview...

Model Overview

The konghou/Qwen2.5-1.5B-DPO-1.5B is a 1.5 billion parameter language model built upon the Qwen2.5 architecture. It has been fine-tuned using Direct Preference Optimization (DPO), a method designed to align language models with human preferences by directly optimizing a policy against a reward model implicitly defined by human comparisons.

Key Capabilities

  • Preference Alignment: Optimized to generate responses that are preferred by humans, making it suitable for applications requiring nuanced and helpful outputs.
  • Instruction Following: Benefits from DPO training to better understand and adhere to user instructions.
  • Conversational AI: Well-suited for dialogue systems and chatbots where generating natural and preferred responses is crucial.

Training Details

This model was trained on the BAAI/Infinity-Preference dataset using the TRL (Transformers Reinforcement Learning) library. The DPO method, introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model," was central to its fine-tuning process. The training utilized specific versions of TRL (1.0.0), Transformers (5.0.0), Pytorch (2.8.0), Datasets (4.8.4), and Tokenizers (0.22.2).

Good For

  • Developing chatbots and virtual assistants that require human-like conversational abilities.
  • Applications where generating preferred and aligned text is a priority.
  • Research into preference-based fine-tuning methods for smaller language models.