princeton-nlp/Llama-3-Base-8B-SFT-ORPO

Warm
Public
8B
FP8
8192
Hugging Face
Overview

Overview

The princeton-nlp/Llama-3-Base-8B-SFT-ORPO is an 8 billion parameter language model built upon the Llama 3 architecture. Developed by princeton-nlp, this model incorporates ORPO (Odds Ratio Preference Optimization), a fine-tuning technique described in the SimPO: Simple Preference Optimization with a Reference-Free Reward preprint.

Key Capabilities

  • Preference Optimization: Utilizes the ORPO method for aligning model outputs with human preferences.
  • Reference-Free Reward: Implements a novel approach to preference optimization that does not require a reference reward model.
  • Llama 3 Base: Benefits from the foundational capabilities of the Llama 3 architecture.

Good For

  • Researchers and developers exploring advanced preference optimization techniques.
  • Applications requiring fine-tuned models with improved alignment without relying on complex reward models.
  • Experimentation with the SimPO methodology for language model training.