Name: RLHFlow/pair-preference-model-LLaMA3-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RLHFlow

RLHFlow/pair-preference-model-LLaMA3-8B Overview

This model is an 8 billion parameter preference model, fine-tuned from meta-llama/Meta-Llama-3-8B-Instruct by RLHFlow. Its primary function is to evaluate and rank pairs of responses, making it a crucial component in Reinforcement Learning from Human Feedback (RLHF) pipelines.

Key Capabilities

Response Ranking: Designed to compare two given responses (A and B) and determine which is preferred based on learned preferences.
Performance Metrics: Achieves strong results on reward benchmarks, including Chat-98.6, Char-hard 65.8, Safety 89.6, and Reasoning 94.9.
Multi-turn Conversation Support: Capable of handling preference ranking within multi-turn conversational contexts.
Bias Mitigation: Implements response swapping during evaluation to mitigate positional bias in ranking.

Training and Methodology

The model was trained using the RLHFlow/pair_preference_model_dataset and leverages a training script from the RLHF-Reward-Modeling repository. The underlying methodology is detailed in the paper "RLHF Workflow: From Reward Modeling to Online RLHF" (TMLR, 2024), which describes the broader RLHF framework.

Use Cases

This model is ideal for:

Automated Preference Labeling: Generating preference scores for LLM outputs to guide further fine-tuning.
Response Quality Evaluation: Assessing the quality, safety, and reasoning capabilities of generated text.
RLHF Integration: Serving as a reward model within complex RLHF systems to optimize LLM behavior.

Overview

RLHFlow/pair-preference-model-LLaMA3-8B Overview

Key Capabilities

Training and Methodology

Use Cases

Full Model Card (README)