princeton-nlp/Llama-3-Base-8B-SFT-RRHF

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 6, 2024Architecture:Transformer Warm

princeton-nlp/Llama-3-Base-8B-SFT-RRHF is a language model developed by princeton-nlp, based on the Llama-3 architecture. This model is fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method, as detailed in the associated research preprint. It is designed to demonstrate the effectiveness of this novel preference optimization technique.

Loading preview...

Model Overview

This model, princeton-nlp/Llama-3-Base-8B-SFT-RRHF, is a Llama-3-based language model developed by princeton-nlp. Its primary distinction lies in its fine-tuning methodology, which utilizes SimPO (Simple Preference Optimization with a Reference-Free Reward). This technique is introduced in the research preprint SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Characteristics

  • Preference Optimization: Fine-tuned using the SimPO method, which is a novel approach to preference optimization that does not require a reference reward model.
  • Research-Oriented: Primarily serves as an implementation and demonstration of the SimPO technique, allowing researchers and developers to explore its capabilities and performance.
  • Llama-3 Base: Built upon the Llama-3 architecture, inheriting its foundational language understanding and generation capabilities.

When to Use This Model

  • Research and Development: Ideal for researchers interested in preference optimization techniques, particularly those exploring alternatives to traditional methods.
  • Evaluating SimPO: Useful for developers and researchers who want to test and compare the performance of models fine-tuned with SimPO against other preference optimization strategies.
  • Understanding Novel Fine-tuning: Provides a practical example of a model trained with a reference-free reward approach, offering insights into its behavior and potential applications.