nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

Hugging Face
TEXT GENERATIONConcurrency Cost:3Model Size:49BQuant:FP8Ctx Length:32kPublished:Jul 25, 2025License:otherArchitecture:Transformer0.2K Warm

The nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct. It is specifically post-trained using a novel Neural Architecture Search (NAS) approach to optimize for reasoning, human chat preferences, and agentic tasks like RAG and tool calling, while balancing accuracy and efficiency. Supporting a 128K token context length, this model is designed for developers building AI agent systems, chatbots, and RAG applications, offering enhanced performance on a single GPU.

Loading preview...

Model Overview

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct. It features a significantly upgraded architecture achieved through a novel Neural Architecture Search (NAS) approach, which optimizes for an accuracy-efficiency trade-off, enabling larger workloads and fitting on a single H200 GPU. This model supports an extended context length of 128K tokens.

Key Capabilities

  • Enhanced Reasoning & Agentic Tasks: Post-trained for superior reasoning, human chat preferences, and agentic tasks such as RAG and tool calling. This includes supervised fine-tuning for Math, Code, Science, and Tool Calling.
  • Multi-stage Reinforcement Learning: Underwent multiple stages of RL, including Reward-aware Preference Optimization (RPO) for chat, Reinforcement Learning with Verifiable Rewards (RLVR) for reasoning, and iterative Direct Preference Optimization (DPO) for tool calling.
  • Efficiency Optimization: Utilizes a NAS approach to reduce memory footprint and improve throughput, making it efficient for high workloads.
  • Multilingual Support: Primarily designed for English and coding languages, with additional support for German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Good For

  • Developers designing AI Agent systems.
  • Building advanced chatbots and conversational AI.
  • Implementing RAG (Retrieval Augmented Generation) systems.
  • Applications requiring robust tool calling capabilities.
  • General instruction-following tasks where reasoning is critical.