princeton-nlp/Llama-3-Base-8B-SFT-ORPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm

The princeton-nlp/Llama-3-Base-8B-SFT-ORPO is an 8 billion parameter language model based on the Llama 3 architecture, developed by princeton-nlp. This model is fine-tuned using the ORPO (Odds Ratio Preference Optimization) method, as detailed in the SimPO research. It is designed for preference optimization tasks, offering a reference-free reward approach for improved alignment.

Loading preview...

Overview

The princeton-nlp/Llama-3-Base-8B-SFT-ORPO is an 8 billion parameter language model built upon the Llama 3 architecture. Developed by princeton-nlp, this model incorporates ORPO (Odds Ratio Preference Optimization), a fine-tuning technique described in the SimPO: Simple Preference Optimization with a Reference-Free Reward preprint.

Key Capabilities

  • Preference Optimization: Utilizes the ORPO method for aligning model outputs with human preferences.
  • Reference-Free Reward: Implements a novel approach to preference optimization that does not require a reference reward model.
  • Llama 3 Base: Benefits from the foundational capabilities of the Llama 3 architecture.

Good For

  • Researchers and developers exploring advanced preference optimization techniques.
  • Applications requiring fine-tuned models with improved alignment without relying on complex reward models.
  • Experimentation with the SimPO methodology for language model training.