Name: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nvidia

Model Overview

NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a powerful large language model from NVIDIA, featuring a unique Latent Mixture-of-Experts (LatentMoE) architecture. This hybrid design integrates Mamba-2, MoE, and Attention layers, and notably includes Multi-Token Prediction (MTP) layers for enhanced text generation speed and quality. The model has 120 billion total parameters with 12 billion active parameters and supports an impressive context length of up to 1 million tokens.

Key Capabilities

Advanced Agentic Workflows: Designed for collaborative AI agents, supporting complex multi-step reasoning and tool use.
Long-Context Reasoning: Excels at processing and understanding information across extremely long contexts, up to 1 million tokens.
High-Volume Workloads: Optimized for demanding applications such as IT ticket automation and other high-throughput tasks.
Configurable Reasoning: Users can enable or disable a reasoning trace via the chat template, allowing for flexible control over model behavior.
Multilingual Support: Capable in English, French, German, Italian, Japanese, Spanish, and Chinese.

Good For

Developers building AI Agent systems requiring robust reasoning and tool-calling capabilities.
Applications needing to process and generate content with very long context windows, such as RAG systems.
Chatbots and conversational AI that benefit from deep understanding and instruction following.
Automating high-volume enterprise tasks like customer support or IT operations.

For more technical details, refer to the NVIDIA Nemotron 3 Super Technical Report.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)