Name: nvidia/Llama-3.3-Nemotron-70B-Select API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nvidia

Model Overview

nvidia/Llama-3.3-Nemotron-70B-Select is a 70 billion parameter large language model from NVIDIA, based on the Meta-Llama-3.3-70B-Instruct architecture. Its core function is to select the most helpful LLM-generated response to user queries, achieved through fine-tuning with scaled Bradley-Terry modeling. This model is a key component of NVIDIA's Feedback-Edit Inference Time Scaling (ITS) system, alongside Llama-3.3-Nemotron-70B-Feedback and Llama-3.3-Nemotron-70B-Edit.

Key Capabilities

Response Selection: Evaluates and selects the highest quality responses from other LLMs for general-domain, open-ended tasks.
Inference-Time Scaling: Designed to be integrated into ITS systems to enhance overall LLM performance.
Foundation Model: Built on the robust Llama 3.3 architecture, ensuring strong underlying language understanding.
Commercial Use: Ready for commercial deployment under the NVIDIA Open Model License and Llama 3.3 Community License.

Performance

When augmented with the Feedback-Edit ITS approach, models like Llama-3.3-Nemotron-Super-49B-v1 achieve a 93.4 Arena Hard score, demonstrating the effectiveness of this selection mechanism in improving overall system performance.

Use Cases

This model is ideal for developers looking to:

Improve the quality and helpfulness of LLM outputs in their applications.
Implement advanced Inference-Time Scaling strategies for response generation.
Enhance general-domain, open-ended conversational AI systems by ensuring optimal response selection.

Overview

Model Overview

Key Capabilities

Performance

Use Cases

Full Model Card (README)