Name: azherali/Qwen2.5-1.5B-Instruct-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: azherali

Overview

azherali/Qwen2.5-1.5B-Instruct-dpo is a 1.5 billion parameter language model, building upon the base Qwen/Qwen2.5-1.5B-Instruct architecture. This model has been specifically fine-tuned using Direct Preference Optimization (DPO), a method detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (huggingface.co/papers/2305.18290). The DPO training process aims to align the model's outputs more closely with human preferences, enhancing its ability to follow instructions and generate desirable responses.

Key Capabilities

Instruction Following: Designed to accurately interpret and execute user instructions.
Preference Alignment: Optimized through DPO to produce outputs that are generally preferred by humans.
Text Generation: Capable of generating coherent and contextually relevant text based on prompts.

Training Details

The model was trained using the TRL library (version 0.26.2) from Hugging Face, with Transformers version 4.57.3 and PyTorch 2.8.0+cu126. This setup facilitates efficient fine-tuning and leverages established frameworks for large language model development.

When to Use

This model is particularly well-suited for applications requiring a compact yet capable instruction-tuned model where response quality and alignment with user preferences are important. Its 1.5 billion parameters make it a good choice for scenarios where computational resources are a consideration, offering a balance between performance and efficiency.

Overview

Overview

Key Capabilities

Training Details

When to Use

Full Model Card (README)