Name: Amu/orpo-phi2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Amu

Amu/orpo-phi2: ORPO Fine-tuning Experiment

Amu/orpo-phi2 is a 3 billion parameter language model derived from Microsoft's Phi-2 base model. This model represents an experimental fine-tuning effort using the ORPO (Odds Ratio Preference Optimization) method, implemented through the trl library.

Key Capabilities & Characteristics

Base Model: Built upon the efficient and capable microsoft/phi-2 architecture.
Fine-tuning Method: Leverages the ORPO algorithm, a preference-based alignment technique, for instruction tuning.
Training Data: Fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, which is designed for preference learning.
Context Length: Supports a context window of 2048 tokens.
Purpose: Primarily serves as a demonstration and testbed for the ORPO fine-tuning approach, showcasing its application on a smaller-scale model.

Good For

Researchers and Developers: Ideal for those interested in exploring or reproducing the ORPO fine-tuning method.
Understanding Preference Alignment: Provides a practical example of how ORPO can be applied to align language models with human preferences.
Resource-Constrained Environments: Its 3B parameter size makes it suitable for experimentation where larger models might be prohibitive.

Overview

Amu/orpo-phi2: ORPO Fine-tuning Experiment

Key Capabilities & Characteristics

Good For

Full Model Card (README)