princeton-nlp/Llama-3-Base-8B-SFT-CPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 6, 2024Architecture:Transformer Warm

princeton-nlp/Llama-3-Base-8B-SFT-CPO is an 8 billion parameter language model developed by Princeton NLP, based on the Llama-3 architecture with an 8192 token context length. This model is fine-tuned using Supervised Fine-Tuning (SFT) and Preference Optimization (CPO) as detailed in the SimPO research, focusing on improving response quality without a reference-free reward. It is designed for general language understanding and generation tasks, leveraging advanced preference optimization techniques.

Loading preview...

Model Overview

princeton-nlp/Llama-3-Base-8B-SFT-CPO is an 8 billion parameter language model developed by Princeton NLP. It is built upon the Llama-3 architecture and features an 8192 token context length. This model incorporates Supervised Fine-Tuning (SFT) and a novel Preference Optimization (CPO) method, as introduced in the research paper SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Capabilities

  • Advanced Preference Optimization: Utilizes a unique CPO method for fine-tuning, aiming to enhance model responses without requiring a reference-free reward.
  • Llama-3 Architecture: Benefits from the robust and efficient Llama-3 base model for strong foundational language understanding.
  • General Language Tasks: Suitable for a wide range of applications including text generation, summarization, and question answering.

Good For

  • Researchers interested in exploring advanced preference optimization techniques and their impact on LLM performance.
  • Developers seeking a Llama-3 based model with enhanced fine-tuning for improved response quality.
  • Applications requiring a capable 8B parameter model for various natural language processing tasks.