nvidia/Llama-3.3-Nemotron-70B-Select

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Mar 14, 2025License:nvidia-open-model-licenseArchitecture:Transformer0.0K Open Weights Warm

The nvidia/Llama-3.3-Nemotron-70B-Select is a 70 billion parameter large language model developed by NVIDIA, built upon the Meta-Llama-3.3-70B-Instruct foundation. It is specifically fine-tuned using scaled Bradley-Terry modeling to select the most helpful LLM-generated responses to user queries. This model is designed to improve performance in general-domain, open-ended tasks by identifying high-quality outputs, making it suitable for integration into Inference-Time-Scaling systems.

Loading preview...

Model Overview

nvidia/Llama-3.3-Nemotron-70B-Select is a 70 billion parameter large language model from NVIDIA, based on the Meta-Llama-3.3-70B-Instruct architecture. Its core function is to select the most helpful LLM-generated response to user queries, achieved through fine-tuning with scaled Bradley-Terry modeling. This model is a key component of NVIDIA's Feedback-Edit Inference Time Scaling (ITS) system, alongside Llama-3.3-Nemotron-70B-Feedback and Llama-3.3-Nemotron-70B-Edit.

Key Capabilities

  • Response Selection: Evaluates and selects the highest quality responses from other LLMs for general-domain, open-ended tasks.
  • Inference-Time Scaling: Designed to be integrated into ITS systems to enhance overall LLM performance.
  • Foundation Model: Built on the robust Llama 3.3 architecture, ensuring strong underlying language understanding.
  • Commercial Use: Ready for commercial deployment under the NVIDIA Open Model License and Llama 3.3 Community License.

Performance

When augmented with the Feedback-Edit ITS approach, models like Llama-3.3-Nemotron-Super-49B-v1 achieve a 93.4 Arena Hard score, demonstrating the effectiveness of this selection mechanism in improving overall system performance.

Use Cases

This model is ideal for developers looking to:

  • Improve the quality and helpfulness of LLM outputs in their applications.
  • Implement advanced Inference-Time Scaling strategies for response generation.
  • Enhance general-domain, open-ended conversational AI systems by ensuring optimal response selection.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p