princeton-nlp/Llama-3-Instruct-8B-SimPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer0.1K Warm

princeton-nlp/Llama-3-Instruct-8B-SimPO is an 8 billion parameter Llama-3-Instruct model developed by Princeton NLP. It is fine-tuned using the SimPO (Simple Preference Optimization) method, which utilizes a reference-free reward for preference optimization. This model is designed for instruction-following tasks, leveraging its unique optimization approach to enhance performance.

Loading preview...

Overview

princeton-nlp/Llama-3-Instruct-8B-SimPO is an 8 billion parameter instruction-tuned model based on the Llama-3-Instruct architecture. Developed by Princeton NLP, this model incorporates the novel SimPO (Simple Preference Optimization) method, which is detailed in their research preprint, "SimPO: Simple Preference Optimization with a Reference-Free Reward." This approach distinguishes it from other models by employing a reference-free reward mechanism for preference optimization.

Key Capabilities

  • Instruction Following: Optimized for understanding and executing user instructions.
  • SimPO Optimization: Utilizes a unique Simple Preference Optimization method that operates without requiring a reference reward model.
  • Llama-3-Instruct Base: Built upon the robust Llama-3-Instruct architecture, providing a strong foundation for general language tasks.

Good For

  • Researchers interested in novel preference optimization techniques, particularly those exploring reference-free methods.
  • Applications requiring an 8B parameter instruction-following model with an 8192-token context length.
  • Experimentation with models fine-tuned using advanced, non-traditional alignment strategies.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p