princeton-nlp/Llama-3-Base-8B-SFT-ORPO
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm
The princeton-nlp/Llama-3-Base-8B-SFT-ORPO is an 8 billion parameter language model based on the Llama 3 architecture, developed by princeton-nlp. This model is fine-tuned using the ORPO (Odds Ratio Preference Optimization) method, as detailed in the SimPO research. It is designed for preference optimization tasks, offering a reference-free reward approach for improved alignment.
Loading preview...
Overview
The princeton-nlp/Llama-3-Base-8B-SFT-ORPO is an 8 billion parameter language model built upon the Llama 3 architecture. Developed by princeton-nlp, this model incorporates ORPO (Odds Ratio Preference Optimization), a fine-tuning technique described in the SimPO: Simple Preference Optimization with a Reference-Free Reward preprint.
Key Capabilities
- Preference Optimization: Utilizes the ORPO method for aligning model outputs with human preferences.
- Reference-Free Reward: Implements a novel approach to preference optimization that does not require a reference reward model.
- Llama 3 Base: Benefits from the foundational capabilities of the Llama 3 architecture.
Good For
- Researchers and developers exploring advanced preference optimization techniques.
- Applications requiring fine-tuned models with improved alignment without relying on complex reward models.
- Experimentation with the SimPO methodology for language model training.