Amu/orpo-phi2

TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Apr 1, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Amu/orpo-phi2 is a 3 billion parameter language model fine-tuned from Microsoft's Phi-2 architecture. It utilizes the ORPO (Odds Ratio Preference Optimization) method via the TRL library, trained on the Ultrafeedback dataset. This model serves as a test implementation for the ORPO fine-tuning approach, demonstrating its application on a compact yet capable base model. Its primary focus is exploring preference alignment techniques rather than general-purpose instruction following.

Loading preview...

Amu/orpo-phi2: ORPO Fine-tuning Experiment

Amu/orpo-phi2 is a 3 billion parameter language model derived from Microsoft's Phi-2 base model. This model represents an experimental fine-tuning effort using the ORPO (Odds Ratio Preference Optimization) method, implemented through the trl library.

Key Capabilities & Characteristics

  • Base Model: Built upon the efficient and capable microsoft/phi-2 architecture.
  • Fine-tuning Method: Leverages the ORPO algorithm, a preference-based alignment technique, for instruction tuning.
  • Training Data: Fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, which is designed for preference learning.
  • Context Length: Supports a context window of 2048 tokens.
  • Purpose: Primarily serves as a demonstration and testbed for the ORPO fine-tuning approach, showcasing its application on a smaller-scale model.

Good For

  • Researchers and Developers: Ideal for those interested in exploring or reproducing the ORPO fine-tuning method.
  • Understanding Preference Alignment: Provides a practical example of how ORPO can be applied to align language models with human preferences.
  • Resource-Constrained Environments: Its 3B parameter size makes it suitable for experimentation where larger models might be prohibitive.