Model Overview

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a 30 billion parameter large language model (LLM) developed by NVIDIA, featuring a unique hybrid Mixture-of-Experts (MoE) architecture. It combines 23 Mamba-2 and MoE layers with 6 Attention layers, activating 6 out of 128 experts plus 1 shared expert per token, resulting in 3.5 billion active parameters. The model is designed for both reasoning and non-reasoning tasks, capable of generating explicit reasoning traces to improve accuracy on challenging prompts, a feature configurable via the chat template.

Key Capabilities

Advanced Reasoning: Can generate step-by-step reasoning traces for complex problems, enhancing solution quality.
Hybrid MoE Architecture: Leverages a Mamba-2 and Transformer hybrid MoE design for efficiency and performance.
Extensive Context Window: Supports an impressive 1 million token context length, suitable for long-document analysis.
Multilingual Support: Supports English, German, Spanish, French, Italian, and Japanese, with improved performance using Qwen.
Commercial Use Ready: Licensed for commercial applications.
Comprehensive Training: Trained on 25 trillion tokens, including a significant portion of synthetic data across code, math, science, and general knowledge.

Good For

AI Agent Systems: Ideal for developers building sophisticated AI agents that require robust reasoning capabilities.
Chatbots and Conversational AI: Suitable for creating high-quality, instruction-following chatbots.
RAG Systems: Effective for Retrieval-Augmented Generation applications due to its long context handling.
Instruction Following: Excels at general instruction-following tasks, with configurable reasoning behavior.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)