unsloth/Nemotron-3-Nano-30B-A3B
The NVIDIA Nemotron-3-Nano-30B-A3B is a 30 billion parameter large language model developed by NVIDIA, featuring a hybrid Mixture-of-Experts (MoE) architecture with Mamba-2 and Attention layers. Designed for both reasoning and non-reasoning tasks, it can generate explicit reasoning traces to improve accuracy on complex queries. This model supports English and coding languages, with additional support for German, Spanish, French, Italian, and Japanese, making it suitable for AI agent systems, chatbots, and RAG applications.
Loading preview...
Model Overview
The NVIDIA Nemotron-3-Nano-30B-A3B is a 30 billion parameter large language model developed by NVIDIA, featuring a unique hybrid Mixture-of-Experts (MoE) architecture. It combines 23 Mamba-2 and MoE layers with 6 Attention layers, utilizing 3.5 billion active parameters out of 30 billion total. The model is designed for both reasoning and non-reasoning tasks, capable of generating a reasoning trace before providing a final response, which generally leads to higher accuracy on complex prompts. This reasoning capability can be toggled via a chat template flag.
Key Capabilities
- Hybrid MoE Architecture: Integrates Mamba-2 and MoE layers for efficient processing, with 128 experts per MoE layer and 5 experts activated per token.
- Reasoning-First Approach: Can generate explicit reasoning traces to enhance accuracy on challenging tasks, configurable via chat template.
- Multilingual Support: Supports English, German, Spanish, French, Italian, and Japanese, alongside 43 programming languages.
- Extensive Training: Pre-trained on 25 trillion tokens, including a vast corpus of code, math, science, and general knowledge data, and further fine-tuned with synthetic data.
- Long Context: Supports a context length of up to 1M tokens, though default Hugging Face configuration is 256k.
Good For
- AI Agent Systems: Ideal for developers building intelligent agents that require robust reasoning capabilities.
- Chatbots: Suitable for creating advanced conversational AI with improved accuracy and response quality.
- RAG Systems: Can be integrated into Retrieval Augmented Generation systems for enhanced information retrieval and generation.
- Instruction Following: Excels at general instruction-following tasks across various domains.