3DJ77/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

TEXT GENERATIONConcurrency Cost:2Model Size:30BQuant:FP8Ctx Length:32kPublished:Apr 1, 2026License:otherArchitecture:Transformer Cold

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a 30 billion parameter large language model developed by NVIDIA, featuring a hybrid Mixture-of-Experts (MoE) architecture with Mamba-2 and Attention layers. Designed for both reasoning and non-reasoning tasks, it can generate explicit reasoning traces for higher accuracy on complex prompts. This model supports a 1M token context length and is optimized for agentic systems, chatbots, and RAG applications across English and several other languages.

Loading preview...

Model Overview

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 is a 30 billion parameter large language model (LLM) developed by NVIDIA, featuring a unique hybrid Mixture-of-Experts (MoE) architecture. It combines 23 Mamba-2 and MoE layers with 6 Attention layers, activating 6 out of 128 experts plus 1 shared expert per token, resulting in 3.5 billion active parameters. The model is designed for both reasoning and non-reasoning tasks, capable of generating explicit reasoning traces to improve accuracy on challenging prompts, a feature configurable via the chat template.

Key Capabilities

  • Advanced Reasoning: Can generate step-by-step reasoning traces for complex problems, enhancing solution quality.
  • Hybrid MoE Architecture: Leverages a Mamba-2 and Transformer hybrid MoE design for efficiency and performance.
  • Extensive Context Window: Supports an impressive 1 million token context length, suitable for long-document analysis.
  • Multilingual Support: Supports English, German, Spanish, French, Italian, and Japanese, with improved performance using Qwen.
  • Commercial Use Ready: Licensed for commercial applications.
  • Comprehensive Training: Trained on 25 trillion tokens, including a significant portion of synthetic data across code, math, science, and general knowledge.

Good For

  • AI Agent Systems: Ideal for developers building sophisticated AI agents that require robust reasoning capabilities.
  • Chatbots and Conversational AI: Suitable for creating high-quality, instruction-following chatbots.
  • RAG Systems: Effective for Retrieval-Augmented Generation applications due to its long context handling.
  • Instruction Following: Excels at general instruction-following tasks, with configurable reasoning behavior.