nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16

TEXT GENERATIONConcurrency Cost:2Model Size:30BQuant:FP8Ctx Length:32kPublished:Dec 3, 2025License:nvidia-nemotron-open-model-licenseArchitecture:Transformer0.1K Open Weights Cold

NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 is a 30 billion parameter base large language model developed by NVIDIA, featuring a Mamba2-Transformer Hybrid Mixture of Experts (MoE) architecture. It is designed for next token prediction and serves as a foundation for instruction fine-tuning, excelling in mathematical reasoning, code generation, and long-context understanding with support for up to 512K tokens. This model is commercially ready and supports 20 languages and 43 programming languages, making it suitable for developers and researchers building specialized AI agents.

Loading preview...

Model Overview

NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 is a 30 billion parameter base large language model developed by NVIDIA. It utilizes a novel Mamba2-Transformer Hybrid Mixture of Experts (MoE) architecture, distinguishing it from many other LLMs. This model is pre-trained for next token prediction and is intended as a robust starting point for instruction fine-tuning, making it suitable for commercial use.

Key Capabilities & Differentiators

  • Hybrid MoE Architecture: Combines Mamba2 and Transformer elements, offering a unique approach to model design.
  • Exceptional Long Context Handling: Demonstrates strong performance on RULER benchmarks, supporting context lengths up to 512K tokens, significantly outperforming a comparable Qwen3 model.
  • Strong Performance in Math & Code: Achieves high scores in MATH (82.88%), GSM8K (92.34%), and HumanEval (78.05%), indicating robust reasoning and coding abilities.
  • Multilingual and Multi-programming Language Support: Trained on 20 human languages and 43 programming languages, enhancing its versatility.
  • Extensive Training Data: Pre-trained on over 13 trillion tokens, including a significant portion of synthetic data generated by various advanced LLMs.

Good For

  • Developers and researchers focused on building advanced instruction-following LLMs.
  • Applications requiring strong mathematical reasoning and code generation capabilities.
  • Use cases demanding very long context understanding and processing.
  • Projects needing a commercially viable base model for further specialization.