nvidia/Llama-3.1-Nemotron-70B-Reward-HF

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Sep 28, 2024License:llama3.1Architecture:Transformer0.1K Cold

The nvidia/Llama-3.1-Nemotron-70B-Reward-HF is a 70 billion parameter reward model developed by NVIDIA, based on the Llama-3.1-70B-Instruct architecture. It is specifically designed to predict the quality of LLM-generated responses by assigning a reward score to assistant turns in English conversations up to 4,096 tokens. This model excels at evaluating response quality, making it suitable for applications requiring automated assessment of LLM outputs and for use in Reinforcement Learning from Human Feedback (RLHF).

Loading preview...

Model Overview

The nvidia/Llama-3.1-Nemotron-70B-Reward-HF is a 70 billion parameter reward model developed by NVIDIA, built upon the Llama-3.1-70B-Instruct base. Its core function is to evaluate and assign a quality score to assistant-generated responses within English conversations, supporting up to 4,096 tokens. This model utilizes a novel training approach combining Bradley Terry and SteerLM Regression Reward Modelling.

Key Capabilities & Differentiators

  • Response Quality Prediction: Accurately rates the quality of LLM-generated assistant turns, with higher scores indicating better quality for a given prompt.
  • RLHF Optimization: This reward model has been instrumental in tuning a Llama-3.1-70B-Instruct model, achieving strong performance on alignment benchmarks like AlpacaEval 2 LC (57.6), Arena Hard (85.0), and GPT-4-Turbo MT-Bench (8.98).
  • Leading Performance: As of October 1, 2024, it ranks #1 on several automatic alignment benchmarks, outperforming models such as GPT-4o and Claude 3.5 Sonnet.
  • RewardBench Leader: Demonstrates top overall performance on the RewardBench leaderboard (94.1%), with strong scores in Chat (97.5%), Safety (95.1%), and Reasoning (98.1%) categories, trained exclusively on permissive licensed data (CC-BY-4.0).
  • Human Preference Alignment: While it may trail some models on GPT-4-annotated benchmarks, it performs comparably or better on categories using human annotations as ground truth, suggesting strong alignment with human preferences.

Usage Considerations

  • Hardware Requirements: Requires 2 or more 80GB NVIDIA Ampere (or newer) GPUs and approximately 150GB of free disk space.
  • Input/Output: Takes text input (conversation turns) and outputs a single float representing the reward score.

This model is ideal for developers focused on fine-tuning LLMs through RLHF or for applications requiring robust, automated evaluation of conversational AI outputs.