SuperQAI2050/STEM_Code
NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 is a 30 billion parameter base large language model developed by NVIDIA, featuring a Mamba2-Transformer Hybrid Mixture of Experts (MoE) architecture. Pre-trained on a massive 13.3 trillion token corpus with a June 2025 data cutoff, it excels in mathematical reasoning, code generation, and long-context understanding, supporting up to 512K tokens. This model is designed as a robust starting point for developers and researchers building instruction-following LLMs, particularly strong in STEM domains and multilingual applications across 20 languages and 43 programming languages.
Loading preview...
Model Overview
NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 is a 30 billion parameter base LLM developed by NVIDIA, utilizing a Mamba2-Transformer Hybrid Mixture of Experts (MoE) architecture. It was pre-trained from scratch on a vast 13.3 trillion token dataset with a data cutoff of June 25, 2025, and is intended for commercial use.
Key Capabilities & Performance
This model demonstrates strong performance across various benchmarks, particularly excelling in:
- Mathematical Reasoning: Achieves 92.34% on GSM8K and 82.88% on MATH, significantly outperforming Qwen3 30B-A3B-Base.
- Code Generation: Scores 78.05% on HumanEval (0-shot) and 75.49% on MBPP-Sanitized (3-shot).
- Long Context Understanding: Supports context lengths up to 512K tokens, with RULER scores of 87.50% at 64K and 70.56% at 512K, a capability not supported by the comparative Qwen3 model at longer contexts.
- Multilingual Support: Trained on 20 human languages and 43 programming languages, including significant portions of Arabic, Japanese, Chinese, and various European languages.
Training & Data
The model's training involved a diverse corpus of crawled, curated, and synthetically generated data, including extensive code, math, science, and general knowledge. A substantial portion of the training data (over 3.5 trillion tokens) is synthetically generated using models like DeepSeek-R1, Mixtral-8x22B-v0.1, and Qwen2.5-72B. The model is optimized for NVIDIA GPU-accelerated systems.
Use Cases
This model is primarily intended for developers and researchers who are building and fine-tuning instruction-following LLMs, especially those requiring strong performance in STEM fields, code generation, and complex reasoning over long contexts.