princeton-nlp/Llama-3-Instruct-8B-IPO

Warm
Public
8B
FP8
8192
Hugging Face
Overview

Overview

princeton-nlp/Llama-3-Instruct-8B-IPO is an 8 billion parameter instruction-tuned language model. It is based on the Llama 3 architecture and has been specifically fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This approach, detailed in the preprint "SimPO: Simple Preference Optimization with a Reference-Free Reward," allows the model to learn from preferences without requiring a reference reward model.

Key Capabilities

  • Instruction Following: Optimized for understanding and executing a wide range of user instructions.
  • Preference Alignment: Benefits from the SimPO fine-tuning method, which enhances its ability to align with human preferences in responses.
  • Context Handling: Supports an 8192 token context window, enabling processing of moderately long inputs.

Good For

  • Applications requiring a robust instruction-tuned model with improved preference alignment.
  • Research and development into preference optimization techniques, particularly SimPO.
  • General-purpose conversational AI and task execution where nuanced responses are valued. For more details, refer to the SimPO repository and the associated preprint.