princeton-nlp/Llama-3-Base-8B-SFT-KTO

Warm
Public
8B
FP8
8192
Hugging Face
Overview

Overview

princeton-nlp/Llama-3-Base-8B-SFT-KTO is an 8 billion parameter model built upon the Llama-3 architecture, developed by princeton-nlp. Its key differentiator lies in its training methodology, which incorporates SimPO (Simple Preference Optimization). SimPO is a novel preference optimization technique that operates without the need for reference rewards, as detailed in the associated preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Capabilities

  • Preference Optimization: Utilizes the SimPO method for aligning model outputs with desired preferences.
  • Reference-Free Reward: Operates without requiring explicit reference responses for reward calculation, simplifying the optimization process.
  • Llama-3 Base: Benefits from the foundational capabilities and architecture of the Llama-3 model family.

Good For

  • Researchers and developers interested in exploring alternative preference optimization techniques.
  • Applications where collecting explicit reference responses for reward modeling is challenging or impractical.
  • Tasks requiring a model fine-tuned with a focus on alignment through a simplified, reference-free approach.