Model Overview
NVIDIA's Llama-3.3-Nemotron-70B-Feedback is a 70 billion parameter large language model, leveraging the Meta-Llama-3.3-70B-Instruct as its base. This model is uniquely fine-tuned using Supervised Finetuning to evaluate and provide feedback on the helpfulness of LLM-generated responses to user queries. It is a key component in the Feedback-Edit Inference Time Scaling (ITS) approach, which has demonstrated leading performance on the Arena Hard Leaderboard as of March 2025.
Key Capabilities
- Feedback Generation: Specializes in assessing and articulating the helpfulness of LLM outputs.
- Inference-Time Scaling (ITS): Designed to be integrated into a system (alongside Llama-3.3-Nemotron-70B-Edit and Llama-3.3-Nemotron-70B-Select) to enhance overall model performance for general-domain, open-ended tasks.
- Foundation Model: Built on the Llama 3.3 architecture, offering a robust base for its specialized feedback function.
- Commercial Use: Licensed for commercial applications under the NVIDIA Open Model License and Llama 3.3 Community License.
Good For
- Improving LLM Performance: Ideal for developers looking to enhance the quality and helpfulness of their LLM outputs through an automated feedback mechanism.
- Research in LLM Evaluation: Useful for studying and implementing advanced inference-time scaling techniques.
- Applications Requiring Response Quality Assessment: Suitable for systems where automated evaluation of generated text is critical.
Training Details
The model was trained and tested on the HelpSteer3 dataset, which comprises 77,564 prompt-responses (for training) and 4,078 prompt-responses (for testing), each annotated with human-generated free-text feedback on overall helpfulness. The input supports up to 128k tokens, with output up to 4k tokens.