Model Overview

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a 30 billion parameter large language model developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It employs a hybrid Mixture-of-Experts (MoE) architecture, combining 23 Mamba-2 and MoE layers with 6 Attention layers, and has 3.5 billion active parameters. The model can be configured to generate reasoning traces for improved accuracy on complex prompts or provide direct answers for simpler tasks.

Key Capabilities

Advanced Reasoning: Achieves strong performance in reasoning benchmarks, particularly with tool use (e.g., 99.2% on AIME25 with tools, 75.0% on GPQA with tools).
Agentic Tasks: Demonstrates competitive results in agentic benchmarks like SWE-Bench (38.8%) and TauBench V2 (49.0% average).
Long Context Understanding: Supports a context length of up to 1 million tokens, with strong performance on RULER-100 benchmarks (e.g., 92.9% at 256k tokens).
Multilingual Support: Supports English, German, Spanish, French, Italian, and Japanese, with significant multilingual pre-training data.
Code Generation: Trained on extensive code data and shows strong performance in coding benchmarks like LiveCodeBench (68.3%).

Good For

Developing AI Agent systems requiring robust reasoning capabilities.
Building chatbots and RAG systems that benefit from detailed instruction following and conversational quality.
Applications requiring long-context processing and understanding.
Commercial use in various AI-powered applications.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)