princeton-nlp/Llama-3-Instruct-8B-IPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kArchitecture:Transformer Warm

Llama-3-Instruct-8B-IPO is an 8 billion parameter instruction-tuned language model developed by princeton-nlp. This model is fine-tuned using the SimPO method, a reference-free preference optimization technique, making it particularly effective for tasks requiring nuanced preference alignment. It is designed for general instruction following with an 8192 token context length.

Loading preview...

Overview

princeton-nlp/Llama-3-Instruct-8B-IPO is an 8 billion parameter instruction-tuned language model. It is based on the Llama 3 architecture and has been specifically fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This approach, detailed in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward," allows the model to learn from preferences without requiring a reference reward model.

Key Capabilities

  • Instruction Following: Optimized for understanding and executing a wide range of user instructions.
  • Preference Alignment: Benefits from the SimPO fine-tuning method, which enhances its ability to align with human preferences in responses.
  • Context Handling: Supports an 8192 token context window, enabling processing of moderately long inputs.

Good For

  • Applications requiring a robust instruction-tuned model with improved preference alignment.
  • Research and development into preference optimization techniques, particularly SimPO.
  • General-purpose conversational AI and task execution where nuanced responses are valued. For more details, refer to the SimPO repository and the associated preprint.