nvidia/Llama-3.3-Nemotron-70B-Select

Warm
Public
70B
FP8
32768
License: nvidia-open-model-license
Hugging Face
Overview

Model Overview

nvidia/Llama-3.3-Nemotron-70B-Select is a 70 billion parameter large language model from NVIDIA, based on the Meta-Llama-3.3-70B-Instruct architecture. Its core function is to select the most helpful LLM-generated response to user queries, achieved through fine-tuning with scaled Bradley-Terry modeling. This model is a key component of NVIDIA's Feedback-Edit Inference Time Scaling (ITS) system, alongside Llama-3.3-Nemotron-70B-Feedback and Llama-3.3-Nemotron-70B-Edit.

Key Capabilities

  • Response Selection: Evaluates and selects the highest quality responses from other LLMs for general-domain, open-ended tasks.
  • Inference-Time Scaling: Designed to be integrated into ITS systems to enhance overall LLM performance.
  • Foundation Model: Built on the robust Llama 3.3 architecture, ensuring strong underlying language understanding.
  • Commercial Use: Ready for commercial deployment under the NVIDIA Open Model License and Llama 3.3 Community License.

Performance

When augmented with the Feedback-Edit ITS approach, models like Llama-3.3-Nemotron-Super-49B-v1 achieve a 93.4 Arena Hard score, demonstrating the effectiveness of this selection mechanism in improving overall system performance.

Use Cases

This model is ideal for developers looking to:

  • Improve the quality and helpfulness of LLM outputs in their applications.
  • Implement advanced Inference-Time Scaling strategies for response generation.
  • Enhance general-domain, open-ended conversational AI systems by ensuring optimal response selection.