Model Overview

NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 is a 30 billion parameter base LLM developed by NVIDIA, utilizing a Mamba2-Transformer Hybrid Mixture of Experts (MoE) architecture. It was pre-trained from scratch on a vast 13.3 trillion token dataset with a data cutoff of June 25, 2025, and is intended for commercial use.

Key Capabilities & Performance

This model demonstrates strong performance across various benchmarks, particularly excelling in:

Mathematical Reasoning: Achieves 92.34% on GSM8K and 82.88% on MATH, significantly outperforming Qwen3 30B-A3B-Base.
Code Generation: Scores 78.05% on HumanEval (0-shot) and 75.49% on MBPP-Sanitized (3-shot).
Long Context Understanding: Supports context lengths up to 512K tokens, with RULER scores of 87.50% at 64K and 70.56% at 512K, a capability not supported by the comparative Qwen3 model at longer contexts.
Multilingual Support: Trained on 20 human languages and 43 programming languages, including significant portions of Arabic, Japanese, Chinese, and various European languages.

Training & Data

The model's training involved a diverse corpus of crawled, curated, and synthetically generated data, including extensive code, math, science, and general knowledge. A substantial portion of the training data (over 3.5 trillion tokens) is synthetically generated using models like DeepSeek-R1, Mixtral-8x22B-v0.1, and Qwen2.5-72B. The model is optimized for NVIDIA GPU-accelerated systems.

Use Cases

This model is primarily intended for developers and researchers who are building and fine-tuning instruction-following LLMs, especially those requiring strong performance in STEM fields, code generation, and complex reasoning over long contexts.

Overview

Model Overview

Key Capabilities & Performance

Training & Data

Use Cases

Full Model Card (README)