princeton-nlp/Llama-3-Instruct-8B-RRHF

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 6, 2024Architecture:Transformer Warm

Llama-3-Instruct-8B-RRHF is an 8 billion parameter instruction-tuned language model developed by princeton-nlp. This model is fine-tuned using the Reference-Free Reward (RRHF) method, as detailed in the SimPO preprint, which optimizes preference without requiring a reference model. It is designed for general instruction following tasks, leveraging its unique preference optimization approach to enhance response quality.

Loading preview...

Model Overview

princeton-nlp/Llama-3-Instruct-8B-RRHF is an 8 billion parameter instruction-tuned language model. It is based on the Llama 3 architecture and distinguishes itself through its fine-tuning methodology, utilizing the Reference-Free Reward (RRHF) technique. This method, introduced in the SimPO preprint, optimizes model preferences without the need for a separate reference reward model, offering a novel approach to alignment.

Key Capabilities

  • Instruction Following: Designed to accurately follow a wide range of user instructions.
  • Preference Optimization: Leverages the SimPO (Simple Preference Optimization) framework with RRHF for enhanced response quality and alignment.
  • Efficient Fine-tuning: The RRHF method provides an alternative to traditional preference optimization techniques, potentially simplifying the training pipeline.

Good For

  • Developers interested in exploring alternative preference optimization methods for instruction-tuned models.
  • General-purpose conversational AI and instruction-based tasks where high-quality, aligned responses are crucial.
  • Research into reward modeling and alignment techniques for large language models.