Model Overview
The princeton-nlp/Llama-3-Instruct-8B-CPO is an 8 billion parameter instruction-tuned language model. Developed by princeton-nlp, this model is distinguished by its fine-tuning process, which utilizes the SimPO (Simple Preference Optimization with a Reference-Free Reward) method. This approach is detailed in their research preprint, SimPO: Simple Preference Optimization with a Reference-Free Reward.
Key Characteristics
- Architecture: Based on the Llama-3 model family.
- Parameter Count: 8 billion parameters.
- Optimization Method: Fine-tuned using SimPO, a novel preference optimization technique that operates without requiring a reference reward model.
- Context Length: Supports an 8192-token context window.
Intended Use Cases
This model is primarily designed for instruction-following applications, where its SimPO-based optimization aims to improve response quality and alignment. Developers interested in exploring advanced preference optimization techniques or requiring a Llama-3-based model with enhanced instruction-following capabilities may find this model particularly suitable. Further technical details and implementation specifics are available in the associated repository.