FriendliAI/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

TEXT GENERATIONConcurrency Cost:2Model Size:30BQuant:FP8Ctx Length:32kPublished:Dec 15, 2025License:nvidia-open-model-licenseArchitecture:Transformer Open Weights Cold

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a 30 billion parameter large language model developed by NVIDIA, featuring a hybrid Mixture-of-Experts (MoE) architecture with Mamba-2 and Attention layers. Designed for both reasoning and non-reasoning tasks, it can generate reasoning traces for higher accuracy on complex prompts. This model supports English, German, Spanish, French, Italian, and Japanese, and excels in agentic reasoning, code generation, and long-context understanding.

Loading preview...

Model Overview

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a 30 billion parameter large language model developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It employs a hybrid Mixture-of-Experts (MoE) architecture, combining 23 Mamba-2 and MoE layers with 6 Attention layers, and has 3.5 billion active parameters. The model can be configured to generate reasoning traces for improved accuracy on complex prompts or provide direct answers for simpler tasks.

Key Capabilities

  • Advanced Reasoning: Achieves strong performance in reasoning benchmarks, particularly with tool use (e.g., 99.2% on AIME25 with tools, 75.0% on GPQA with tools).
  • Agentic Tasks: Demonstrates competitive results in agentic benchmarks like SWE-Bench (38.8%) and TauBench V2 (49.0% average).
  • Long Context Understanding: Supports a context length of up to 1 million tokens, with strong performance on RULER-100 benchmarks (e.g., 92.9% at 256k tokens).
  • Multilingual Support: Supports English, German, Spanish, French, Italian, and Japanese, with significant multilingual pre-training data.
  • Code Generation: Trained on extensive code data and shows strong performance in coding benchmarks like LiveCodeBench (68.3%).

Good For

  • Developing AI Agent systems requiring robust reasoning capabilities.
  • Building chatbots and RAG systems that benefit from detailed instruction following and conversational quality.
  • Applications requiring long-context processing and understanding.
  • Commercial use in various AI-powered applications.