Amu/orpo-phi2
Amu/orpo-phi2 is a 3 billion parameter language model fine-tuned from Microsoft's Phi-2 architecture. It utilizes the ORPO (Odds Ratio Preference Optimization) method via the TRL library, trained on the Ultrafeedback dataset. This model serves as a test implementation for the ORPO fine-tuning approach, demonstrating its application on a compact yet capable base model. Its primary focus is exploring preference alignment techniques rather than general-purpose instruction following.
Loading preview...
Amu/orpo-phi2: ORPO Fine-tuning Experiment
Amu/orpo-phi2 is a 3 billion parameter language model derived from Microsoft's Phi-2 base model. This model represents an experimental fine-tuning effort using the ORPO (Odds Ratio Preference Optimization) method, implemented through the trl library.
Key Capabilities & Characteristics
- Base Model: Built upon the efficient and capable
microsoft/phi-2architecture. - Fine-tuning Method: Leverages the ORPO algorithm, a preference-based alignment technique, for instruction tuning.
- Training Data: Fine-tuned on the
HuggingFaceH4/ultrafeedback_binarizeddataset, which is designed for preference learning. - Context Length: Supports a context window of 2048 tokens.
- Purpose: Primarily serves as a demonstration and testbed for the ORPO fine-tuning approach, showcasing its application on a smaller-scale model.
Good For
- Researchers and Developers: Ideal for those interested in exploring or reproducing the ORPO fine-tuning method.
- Understanding Preference Alignment: Provides a practical example of how ORPO can be applied to align language models with human preferences.
- Resource-Constrained Environments: Its 3B parameter size makes it suitable for experimentation where larger models might be prohibitive.