princeton-nlp/Llama-3-Instruct-8B-DPO

Warm
Public
8B
FP8
8192
Hugging Face
Overview

Overview

princeton-nlp/Llama-3-Instruct-8B-DPO is an 8 billion parameter instruction-tuned language model. Developed by princeton-nlp, this model is built upon the Llama-3 architecture and features an 8192-token context window. Its key differentiator is the application of SimPO (Simple Preference Optimization), a novel fine-tuning method that operates with a reference-free reward mechanism.

Key Capabilities

  • Instruction Following: Optimized for understanding and executing user instructions.
  • Preference Alignment: Fine-tuned using SimPO to align model outputs with desired preferences without requiring a reference model.
  • Conversational AI: Suitable for generating coherent and contextually relevant responses in dialogue systems.

Training Details

This model's development is detailed in the preprint SimPO: Simple Preference Optimization with a Reference-Free Reward. Further technical information and resources are available in the associated repository.

Use Cases

  • General-purpose instruction following.
  • Chatbot development and conversational agents.
  • Tasks requiring preference-aligned text generation.