nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a 120 billion total parameter (12 billion active) large language model developed by NVIDIA. It features a hybrid LatentMoE architecture combining Mamba-2, MoE, and Attention layers, enhanced with Multi-Token Prediction (MTP) for faster generation and improved quality. Optimized for agentic workflows, long-context reasoning up to 1 million tokens, and high-volume tasks like IT ticket automation, this model excels in complex instruction following and tool use across English, French, German, Italian, Japanese, Spanish, and Chinese.
Loading preview...
Model Overview
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a powerful large language model from NVIDIA, featuring a unique Latent Mixture-of-Experts (LatentMoE) architecture. This hybrid design integrates Mamba-2, MoE, and Attention layers, and notably includes Multi-Token Prediction (MTP) layers for enhanced text generation speed and quality. The model has 120 billion total parameters with 12 billion active parameters and supports an impressive context length of up to 1 million tokens.
Key Capabilities
- Advanced Agentic Workflows: Designed for collaborative AI agents, supporting complex multi-step reasoning and tool use.
- Long-Context Reasoning: Excels at processing and understanding information across extremely long contexts, up to 1 million tokens.
- High-Volume Workloads: Optimized for demanding applications such as IT ticket automation and other high-throughput tasks.
- Configurable Reasoning: Users can enable or disable a reasoning trace via the chat template, allowing for flexible control over model behavior.
- Multilingual Support: Capable in English, French, German, Italian, Japanese, Spanish, and Chinese.
Good For
- Developers building AI Agent systems requiring robust reasoning and tool-calling capabilities.
- Applications needing to process and generate content with very long context windows, such as RAG systems.
- Chatbots and conversational AI that benefit from deep understanding and instruction following.
- Automating high-volume enterprise tasks like customer support or IT operations.
For more technical details, refer to the NVIDIA Nemotron 3 Super Technical Report.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.