nvidia/Llama-3.3-Nemotron-70B-Edit

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Mar 14, 2025License:nvidia-open-model-licenseArchitecture:Transformer0.0K Open Weights Cold

The nvidia/Llama-3.3-Nemotron-70B-Edit is a 70 billion parameter large language model developed by NVIDIA, built upon Meta-Llama-3.3-70B-Instruct. It is fine-tuned using Supervised Finetuning and Reinforcement Learning specifically to edit LLM-generated responses based on provided feedback. This model is designed to improve the helpfulness of responses by incorporating user feedback, making it suitable for general-domain, open-ended tasks requiring iterative refinement.

Loading preview...

Model Overview

nvidia/Llama-3.3-Nemotron-70B-Edit is a 70 billion parameter large language model from NVIDIA, founded on Meta-Llama-3.3-70B-Instruct. This model is uniquely fine-tuned using Supervised Finetuning and Reinforcement Learning to edit LLM-generated responses based on explicit feedback, enhancing the helpfulness of outputs. It is a key component of the Feedback-Edit Inference Time Scaling (ITS) system, which has demonstrated leading performance on the Arena Hard LeaderBoard, achieving 93.4% when combined with Llama-3.3-Nemotron-Super-49B-v1.

Key Capabilities

  • Feedback-driven Response Editing: Specializes in refining and improving LLM responses by incorporating user feedback.
  • Inference-Time Scaling: Designed to be part of a system that leverages feedback for performance improvement during inference.
  • Commercial Use: Licensed for commercial applications under the NVIDIA Open Model License and Llama 3.3 Community License.
  • General-Domain Tasks: Optimized for open-ended tasks where iterative refinement of responses is beneficial.

Use Cases

  • Improving LLM Output Quality: Ideal for scenarios where initial LLM responses need to be iteratively improved based on human or automated feedback.
  • Interactive AI Systems: Suitable for applications requiring dynamic adjustment of generated text to better meet user expectations.
  • Research in Feedback Mechanisms: Can be used to explore and develop advanced methods for integrating feedback into LLM workflows.

This model supports a maximum input context of 128k tokens and generates outputs up to 4k tokens. It was trained on the HelpSteer3 dataset, which includes prompt-responses annotated with free-text feedback and edited responses.