nvidia/Llama-3.1-Nemotron-70B-Reward-HF

Warm
Public
70B
FP8
32768
Sep 28, 2024
License: llama3.1
Hugging Face
Overview

Model Overview

The nvidia/Llama-3.1-Nemotron-70B-Reward-HF is a 70 billion parameter reward model developed by NVIDIA, built upon the Llama-3.1-70B-Instruct base. Its core function is to evaluate and assign a quality score to assistant-generated responses within English conversations, supporting up to 4,096 tokens. This model utilizes a novel training approach combining Bradley Terry and SteerLM Regression Reward Modelling.

Key Capabilities & Differentiators

  • Response Quality Prediction: Accurately rates the quality of LLM-generated assistant turns, with higher scores indicating better quality for a given prompt.
  • RLHF Optimization: This reward model has been instrumental in tuning a Llama-3.1-70B-Instruct model, achieving strong performance on alignment benchmarks like AlpacaEval 2 LC (57.6), Arena Hard (85.0), and GPT-4-Turbo MT-Bench (8.98).
  • Leading Performance: As of October 1, 2024, it ranks #1 on several automatic alignment benchmarks, outperforming models such as GPT-4o and Claude 3.5 Sonnet.
  • RewardBench Leader: Demonstrates top overall performance on the RewardBench leaderboard (94.1%), with strong scores in Chat (97.5%), Safety (95.1%), and Reasoning (98.1%) categories, trained exclusively on permissive licensed data (CC-BY-4.0).
  • Human Preference Alignment: While it may trail some models on GPT-4-annotated benchmarks, it performs comparably or better on categories using human annotations as ground truth, suggesting strong alignment with human preferences.

Usage Considerations

  • Hardware Requirements: Requires 2 or more 80GB NVIDIA Ampere (or newer) GPUs and approximately 150GB of free disk space.
  • Input/Output: Takes text input (conversation turns) and outputs a single float representing the reward score.

This model is ideal for developers focused on fine-tuning LLMs through RLHF or for applications requiring robust, automated evaluation of conversational AI outputs.