nvidia/Llama-3.3-Nemotron-70B-Feedback

Warm
Public
70B
FP8
32768
Mar 14, 2025
License: nvidia-open-model-license
Hugging Face
Overview

Model Overview

NVIDIA's Llama-3.3-Nemotron-70B-Feedback is a 70 billion parameter large language model, leveraging the Meta-Llama-3.3-70B-Instruct as its base. This model is uniquely fine-tuned using Supervised Finetuning to evaluate and provide feedback on the helpfulness of LLM-generated responses to user queries. It is a key component in the Feedback-Edit Inference Time Scaling (ITS) approach, which has demonstrated leading performance on the Arena Hard Leaderboard as of March 2025.

Key Capabilities

  • Feedback Generation: Specializes in assessing and articulating the helpfulness of LLM outputs.
  • Inference-Time Scaling (ITS): Designed to be integrated into a system (alongside Llama-3.3-Nemotron-70B-Edit and Llama-3.3-Nemotron-70B-Select) to enhance overall model performance for general-domain, open-ended tasks.
  • Foundation Model: Built on the Llama 3.3 architecture, offering a robust base for its specialized feedback function.
  • Commercial Use: Licensed for commercial applications under the NVIDIA Open Model License and Llama 3.3 Community License.

Good For

  • Improving LLM Performance: Ideal for developers looking to enhance the quality and helpfulness of their LLM outputs through an automated feedback mechanism.
  • Research in LLM Evaluation: Useful for studying and implementing advanced inference-time scaling techniques.
  • Applications Requiring Response Quality Assessment: Suitable for systems where automated evaluation of generated text is critical.

Training Details

The model was trained and tested on the HelpSteer3 dataset, which comprises 77,564 prompt-responses (for training) and 4,078 prompt-responses (for testing), each annotated with human-generated free-text feedback on overall helpfulness. The input supports up to 128k tokens, with output up to 4k tokens.