nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
TEXT GENERATIONConcurrency Cost:4Model Size:120BQuant:FP8Ctx Length:32kPublished:Mar 10, 2026License:nvidia-nemotron-open-model-licenseArchitecture:Transformer0.3K Open Weights Warm
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a 120 billion total parameter (12 billion active) large language model developed by NVIDIA. It features a hybrid LatentMoE architecture combining Mamba-2, MoE, and Attention layers with Multi-Token Prediction (MTP) for faster generation and improved quality. Optimized for agentic workflows, long-context reasoning up to 1 million tokens, and high-volume tasks like IT ticket automation, this model excels in complex instruction following and tool use across English, French, German, Italian, Japanese, Spanish, and Chinese.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–