Name: princeton-nlp/Llama-3-Base-8B-SFT-KTO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Overview

princeton-nlp/Llama-3-Base-8B-SFT-KTO is an 8 billion parameter model built upon the Llama-3 architecture, developed by princeton-nlp. Its key differentiator lies in its training methodology, which incorporates SimPO (Simple Preference Optimization). SimPO is a novel preference optimization technique that operates without the need for reference rewards, as detailed in the associated preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Capabilities

Preference Optimization: Utilizes the SimPO method for aligning model outputs with desired preferences.
Reference-Free Reward: Operates without requiring explicit reference responses for reward calculation, simplifying the optimization process.
Llama-3 Base: Benefits from the foundational capabilities and architecture of the Llama-3 model family.

Good For

Researchers and developers interested in exploring alternative preference optimization techniques.
Applications where collecting explicit reference responses for reward modeling is challenging or impractical.
Tasks requiring a model fine-tuned with a focus on alignment through a simplified, reference-free approach.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)