princeton-nlp/Llama-3-Instruct-8B-CPO

Warm
Public
8B
FP8
8192
Hugging Face
Overview

Model Overview

The princeton-nlp/Llama-3-Instruct-8B-CPO is an 8 billion parameter instruction-tuned language model. Developed by princeton-nlp, this model is distinguished by its fine-tuning process, which utilizes the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This approach is detailed in their research preprint, SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Characteristics

  • Architecture: Based on the Llama-3 model family.
  • Parameter Count: 8 billion parameters.
  • Optimization Method: Fine-tuned using SimPO, a novel preference optimization technique that operates without requiring a reference reward model.
  • Context Length: Supports an 8192-token context window.

Intended Use Cases

This model is primarily designed for instruction-following applications, where its SimPO-based optimization aims to improve response quality and alignment. Developers interested in exploring advanced preference optimization techniques or requiring a Llama-3-based model with enhanced instruction-following capabilities may find this model particularly suitable. Further technical details and implementation specifics are available in the associated repository.