Model Overview

The NVIDIA Nemotron-3-Nano-30B-A3B is a 30 billion parameter large language model developed by NVIDIA, featuring a unique hybrid Mixture-of-Experts (MoE) architecture. It combines 23 Mamba-2 and MoE layers with 6 Attention layers, utilizing 3.5 billion active parameters out of 30 billion total. The model is designed for both reasoning and non-reasoning tasks, capable of generating a reasoning trace before providing a final response, which generally leads to higher accuracy on complex prompts. This reasoning capability can be toggled via a chat template flag.

Key Capabilities

Hybrid MoE Architecture: Integrates Mamba-2 and MoE layers for efficient processing, with 128 experts per MoE layer and 5 experts activated per token.
Reasoning-First Approach: Can generate explicit reasoning traces to enhance accuracy on challenging tasks, configurable via chat template.
Multilingual Support: Supports English, German, Spanish, French, Italian, and Japanese, alongside 43 programming languages.
Extensive Training: Pre-trained on 25 trillion tokens, including a vast corpus of code, math, science, and general knowledge data, and further fine-tuned with synthetic data.
Long Context: Supports a context length of up to 1M tokens, though default Hugging Face configuration is 256k.

Good For

AI Agent Systems: Ideal for developers building intelligent agents that require robust reasoning capabilities.
Chatbots: Suitable for creating advanced conversational AI with improved accuracy and response quality.
RAG Systems: Can be integrated into Retrieval Augmented Generation systems for enhanced information retrieval and generation.
Instruction Following: Excels at general instruction-following tasks across various domains.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)