Name: princeton-nlp/Llama-3-Base-8B-SFT-RRHF API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Model Overview

This model, princeton-nlp/Llama-3-Base-8B-SFT-RRHF, is a Llama-3-based language model developed by princeton-nlp. Its primary distinction lies in its fine-tuning methodology, which utilizes SimPO (Simple Preference Optimization with a Reference-Free Reward). This technique is introduced in the research preprint SimPO: Simple Preference Optimization with a Reference-Free Reward.

Key Characteristics

Preference Optimization: Fine-tuned using the SimPO method, which is a novel approach to preference optimization that does not require a reference reward model.
Research-Oriented: Primarily serves as an implementation and demonstration of the SimPO technique, allowing researchers and developers to explore its capabilities and performance.
Llama-3 Base: Built upon the Llama-3 architecture, inheriting its foundational language understanding and generation capabilities.

When to Use This Model

Research and Development: Ideal for researchers interested in preference optimization techniques, particularly those exploring alternatives to traditional methods.
Evaluating SimPO: Useful for developers and researchers who want to test and compare the performance of models fine-tuned with SimPO against other preference optimization strategies.
Understanding Novel Fine-tuning: Provides a practical example of a model trained with a reference-free reward approach, offering insights into its behavior and potential applications.

Overview

Model Overview

Key Characteristics

When to Use This Model

Full Model Card (README)