Name: princeton-nlp/Llama-3-Instruct-8B-RRHF API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

Model Overview

princeton-nlp/Llama-3-Instruct-8B-RRHF is an 8 billion parameter instruction-tuned language model. It is based on the Llama 3 architecture and distinguishes itself through its fine-tuning methodology, utilizing the Reference-Free Reward (RRHF) technique. This method, introduced in the SimPO preprint, optimizes model preferences without the need for a separate reference reward model, offering a novel approach to alignment.

Key Capabilities

Instruction Following: Designed to accurately follow a wide range of user instructions.
Preference Optimization: Leverages the SimPO (Simple Preference Optimization) framework with RRHF for enhanced response quality and alignment.
Efficient Fine-tuning: The RRHF method provides an alternative to traditional preference optimization techniques, potentially simplifying the training pipeline.

Good For

Developers interested in exploring alternative preference optimization methods for instruction-tuned models.
General-purpose conversational AI and instruction-based tasks where high-quality, aligned responses are crucial.
Research into reward modeling and alignment techniques for large language models.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)