Name: NLP-Final-Project/phi-2-ipo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: NLP-Final-Project

Model Overview

NLP-Final-Project/phi-2-ipo is a 3 billion parameter language model, fine-tuned from the original microsoft/phi-2 base model. This fine-tuning process utilized Direct Preference Optimization (DPO), a method introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (paper link). The training was conducted using the TRL library, specifically TRL version 1.3.0.

Key Capabilities

Preference-Aligned Text Generation: Enhanced to generate responses that better align with human preferences due to DPO training.
Efficient Fine-tuning: Demonstrates the application of DPO for effective fine-tuning of smaller language models.

Training Details

The model's training procedure leveraged DPO, which directly optimizes a language model to align with human preferences without the need for a separate reward model. This approach aims to improve the quality and helpfulness of generated text. The training environment included Transformers 5.8.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)