unsloth/Llama-3_3-Nemotron-Super-49B-v1_5

TEXT GENERATIONConcurrency Cost:3Model Size:49BQuant:FP8Ctx Length:32kPublished:Jul 28, 2025License:nvidia-open-model-licenseArchitecture:Transformer0.0K Open Weights Cold

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49 billion parameter large language model developed by NVIDIA, derived from Meta Llama-3.3-70B-Instruct. It is post-trained for reasoning, human chat preferences, and agentic tasks like RAG and tool calling, supporting a 128K token context length. This model utilizes a novel Neural Architecture Search (NAS) approach to optimize for accuracy and efficiency, enabling deployment on a single GPU. It excels in reasoning, math, code, and tool calling, making it suitable for AI agent systems, chatbots, and instruction-following tasks.

Loading preview...

Model Overview

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49 billion parameter large language model developed by NVIDIA, building upon Meta's Llama-3.3-70B-Instruct. This model is specifically post-trained to enhance reasoning capabilities, align with human chat preferences, and perform agentic tasks such as Retrieval Augmented Generation (RAG) and tool calling. It supports an extended context length of 128K tokens.

Key Differentiators

  • Efficiency and Accuracy Trade-off: Employs a novel Neural Architecture Search (NAS) approach, detailed in this paper, to significantly reduce memory footprint and enable deployment on a single GPU (e.g., H200) while maintaining high accuracy.
  • Multi-phase Post-training: Underwent extensive supervised fine-tuning for Math, Code, Science, and Tool Calling. It also incorporates multiple stages of Reinforcement Learning (RL), including Reward-aware Preference Optimization (RPO) for chat and Reinforcement Learning with Verifiable Rewards (RLVR) for reasoning, alongside iterative Direct Preference Optimization (DPO) for tool calling.
  • Optimized Architecture: Utilizes a customized Llama 3.3 70B Instruct network architecture with non-standard and non-repetitive blocks, including skip attention and variable FFN ratios, derived from NAS and block-wise distillation.

Intended Use Cases

  • AI Agent Systems: Ideal for developers building sophisticated AI agents.
  • Chatbots: Suitable for creating advanced conversational AI applications.
  • RAG Systems: Designed to perform well in Retrieval Augmented Generation scenarios.
  • Tool Calling: Enhanced capabilities for integrating with external tools and functions.
  • Instruction Following: General-purpose instruction-following tasks in English and coding languages, with support for several other non-English languages.