Name: nvidia/Llama-3.3-Nemotron-70B-Feedback API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nvidia

Model Overview

NVIDIA's Llama-3.3-Nemotron-70B-Feedback is a 70 billion parameter large language model, leveraging the Meta-Llama-3.3-70B-Instruct as its base. This model is uniquely fine-tuned using Supervised Finetuning to evaluate and provide feedback on the helpfulness of LLM-generated responses to user queries. It is a key component in the Feedback-Edit Inference Time Scaling (ITS) approach, which has demonstrated leading performance on the Arena Hard Leaderboard as of March 2025.

Key Capabilities

Feedback Generation: Specializes in assessing and articulating the helpfulness of LLM outputs.
Inference-Time Scaling (ITS): Designed to be integrated into a system (alongside Llama-3.3-Nemotron-70B-Edit and Llama-3.3-Nemotron-70B-Select) to enhance overall model performance for general-domain, open-ended tasks.
Foundation Model: Built on the Llama 3.3 architecture, offering a robust base for its specialized feedback function.
Commercial Use: Licensed for commercial applications under the NVIDIA Open Model License and Llama 3.3 Community License.

Good For

Improving LLM Performance: Ideal for developers looking to enhance the quality and helpfulness of their LLM outputs through an automated feedback mechanism.
Research in LLM Evaluation: Useful for studying and implementing advanced inference-time scaling techniques.
Applications Requiring Response Quality Assessment: Suitable for systems where automated evaluation of generated text is critical.

Training Details

The model was trained and tested on the HelpSteer3 dataset, which comprises 77,564 prompt-responses (for training) and 4,078 prompt-responses (for testing), each annotated with human-generated free-text feedback on overall helpfulness. The input supports up to 128k tokens, with output up to 4k tokens.