azherali/Qwen2.5-1.5B-Instruct-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 8, 2026Architecture:Transformer Warm

azherali/Qwen2.5-1.5B-Instruct-dpo is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned by azherali using Direct Preference Optimization (DPO). Based on the Qwen2.5-1.5B-Instruct architecture, this model is optimized for generating high-quality, preference-aligned responses. It is suitable for various instruction-following tasks, leveraging its DPO training for improved conversational coherence and helpfulness.

Loading preview...

Overview

azherali/Qwen2.5-1.5B-Instruct-dpo is a 1.5 billion parameter language model, building upon the base Qwen/Qwen2.5-1.5B-Instruct architecture. This model has been specifically fine-tuned using Direct Preference Optimization (DPO), a method detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (huggingface.co/papers/2305.18290). The DPO training process aims to align the model's outputs more closely with human preferences, enhancing its ability to follow instructions and generate desirable responses.

Key Capabilities

  • Instruction Following: Designed to accurately interpret and execute user instructions.
  • Preference Alignment: Optimized through DPO to produce outputs that are generally preferred by humans.
  • Text Generation: Capable of generating coherent and contextually relevant text based on prompts.

Training Details

The model was trained using the TRL library (version 0.26.2) from Hugging Face, with Transformers version 4.57.3 and PyTorch 2.8.0+cu126. This setup facilitates efficient fine-tuning and leverages established frameworks for large language model development.

When to Use

This model is particularly well-suited for applications requiring a compact yet capable instruction-tuned model where response quality and alignment with user preferences are important. Its 1.5 billion parameters make it a good choice for scenarios where computational resources are a consideration, offering a balance between performance and efficiency.