princeton-nlp/Llama-3-Instruct-8B-ORPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm

princeton-nlp/Llama-3-Instruct-8B-ORPO is an 8 billion parameter language model developed by princeton-nlp, based on the Llama-3 architecture with an 8192 token context length. This model is fine-tuned using the SimPO (Simple Preference Optimization) method, which is a reference-free reward approach for preference optimization. It is designed to demonstrate the effectiveness of SimPO as detailed in the associated research preprint.

Loading preview...

Model Overview

princeton-nlp/Llama-3-Instruct-8B-ORPO is an 8 billion parameter language model derived from the Llama-3 architecture, featuring an 8192 token context length. Developed by princeton-nlp, this model's primary distinction lies in its fine-tuning methodology: it utilizes SimPO (Simple Preference Optimization). SimPO is a novel, reference-free reward approach for preference optimization, as introduced in the research preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward".

Key Characteristics

  • Architecture: Llama-3 base model.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports an 8192 token context window.
  • Fine-tuning Method: Employs SimPO, a preference optimization technique that operates without requiring a reference reward model.

Intended Use

This model serves as an implementation and demonstration of the SimPO fine-tuning method. It is particularly relevant for researchers and developers interested in:

  • Exploring alternative preference optimization techniques for large language models.
  • Understanding the practical application of reference-free reward methods.
  • Evaluating the performance of models fine-tuned with SimPO, as detailed in the associated research paper and repository.