nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a 120 billion total parameter (12 billion active) large language model developed by NVIDIA. It features a hybrid LatentMoE architecture combining Mamba-2, MoE, and Attention layers, enhanced with Multi-Token Prediction (MTP) for faster generation and improved quality. Optimized for agentic workflows, long-context reasoning up to 1M tokens, and high-volume tasks like IT ticket automation, this model excels in collaborative agent systems and complex instruction following across English, French, German, Italian, Japanese, Spanish, and Chinese.
Loading preview...
Model Overview
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a powerful large language model developed by NVIDIA, featuring a unique Latent Mixture-of-Experts (LatentMoE) architecture. This hybrid design integrates Mamba-2, MoE, and Attention layers, and notably includes Multi-Token Prediction (MTP) layers for enhanced text generation speed and quality. The model has 120 billion total parameters with 12 billion active parameters and supports an impressive context length of up to 1 million tokens.
Key Capabilities
- Advanced Agentic Workflows: Designed for building specialized AI agents, supporting complex multi-step tool use and reasoning.
- Long-Context Reasoning: Excels at processing and understanding information across extremely long contexts, up to 1M tokens, making it suitable for RAG systems and multi-document aggregation.
- High-Volume Workloads: Optimized for efficiency in tasks such as IT ticket automation and other high-throughput applications.
- Configurable Reasoning: Offers a flexible reasoning mode that can be enabled or disabled via the chat template, allowing for tailored performance.
- Multilingual Support: Supports English, French, German, Italian, Japanese, Spanish, and Chinese.
- Efficient Training: Utilizes NVFP4 quantization during pre-training to maximize compute efficiency.
Good For
- Developers creating AI Agent systems requiring robust reasoning and tool-use capabilities.
- Applications demanding long-context understanding and processing.
- Chatbots and conversational AI that need to maintain coherence over extended interactions.
- RAG systems where accurate retrieval and synthesis from large document sets are critical.
- Automating high-volume enterprise tasks like IT support or customer service ticket resolution.