princeton-nlp/Llama-3-Base-8B-SFT-DPO
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm

The princeton-nlp/Llama-3-Base-8B-SFT-DPO is an 8 billion parameter Llama-3-based language model developed by Princeton NLP, fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This model is specifically optimized for preference alignment without requiring a reference reward model, making it suitable for tasks benefiting from direct preference optimization. It offers an 8192-token context window and is derived from research detailed in the SimPO preprint.

Loading preview...

princeton-nlp/Llama-3-Base-8B-SFT-DPO Overview

This model is an 8 billion parameter variant of the Llama-3 architecture, developed by Princeton NLP. It is a result of research presented in the preprint SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Characteristics

  • Architecture: Llama-3-Base with 8 billion parameters.
  • Optimization Method: Fine-tuned using SimPO (Simple Preference Optimization), a novel approach that aligns the model with human preferences without the need for a separate reference reward model.
  • Context Window: Supports an 8192-token context length.

What Makes This Model Different?

Unlike many other preference-optimized models that rely on complex reward models, this model leverages SimPO for direct preference optimization. This method simplifies the alignment process, potentially offering a more efficient or robust way to integrate human feedback into model training. The focus is on achieving strong preference alignment with a reference-free reward mechanism.

Should You Use This Model?

This model is particularly well-suited for use cases where:

  • You require a Llama-3-based model with strong preference alignment.
  • You are interested in exploring models optimized with novel, reference-free preference optimization techniques.
  • Your application benefits from a model that has been directly aligned with human preferences through a simplified training pipeline.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p