Name: chenyongxi/Qwen2.5-1.5B-DPO-1.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: chenyongxi

Model Overview

The chenyongxi/Qwen2.5-1.5B-DPO-1.5B is a 1.5 billion parameter language model, fine-tuned from an unspecified base Qwen2.5 model. It was trained using Direct Preference Optimization (DPO), a method that aligns language model outputs with human preferences by treating the preference data as implicit reward signals. The training utilized the BAAI/Infinity-Preference dataset and the TRL (Transformers Reinforcement Learning) library.

Key Capabilities

Preference Alignment: Optimized through DPO to generate responses that are aligned with human preferences, making it suitable for conversational AI and interactive applications.
Text Generation: Capable of generating coherent and contextually relevant text based on given prompts.
Long Context Understanding: Supports a context length of 32768 tokens, allowing it to process and generate text for longer inputs.

Training Details

The model's training procedure involved:

Methodology: Direct Preference Optimization (DPO), as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (arXiv:2305.18290).
Dataset: Fine-tuned on the BAAI/Infinity-Preference dataset.
Frameworks: Developed using TRL (version 0.28.0.dev0), Transformers (version 4.56.2), Pytorch (version 2.8.0+cu128), Datasets (version 3.0.0), and Tokenizers (version 0.22.2).

Good For

Applications requiring preference-aligned text generation.
Conversational agents and chatbots where response quality and human-likeness are important.
Tasks benefiting from a model trained with DPO for improved output quality.