princeton-nlp/Llama-3-Base-8B-SFT-KTO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 17, 2024Architecture:Transformer Warm

princeton-nlp/Llama-3-Base-8B-SFT-KTO is an 8 billion parameter language model developed by princeton-nlp, based on the Llama-3 architecture. This model is specifically fine-tuned using the SimPO (Simple Preference Optimization) method, which utilizes a reference-free reward mechanism. It is designed for tasks benefiting from preference optimization without requiring explicit reference responses, offering a distinct approach to alignment.

Loading preview...

Overview

princeton-nlp/Llama-3-Base-8B-SFT-KTO is an 8 billion parameter model built upon the Llama-3 architecture, developed by princeton-nlp. Its key differentiator lies in its training methodology, which incorporates SimPO (Simple Preference Optimization). SimPO is a novel preference optimization technique that operates without the need for reference rewards, as detailed in the associated preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Capabilities

  • Preference Optimization: Utilizes the SimPO method for aligning model outputs with desired preferences.
  • Reference-Free Reward: Operates without requiring explicit reference responses for reward calculation, simplifying the optimization process.
  • Llama-3 Base: Benefits from the foundational capabilities and architecture of the Llama-3 model family.

Good For

  • Researchers and developers interested in exploring alternative preference optimization techniques.
  • Applications where collecting explicit reference responses for reward modeling is challenging or impractical.
  • Tasks requiring a model fine-tuned with a focus on alignment through a simplified, reference-free approach.