nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 4, 2025License:cc-by-nc-4.0Architecture:Transformer0.1K Open Weights Cold

The nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct is an 8 billion parameter language model developed by NVIDIA, built upon the Llama-3.1 architecture. This model is specifically designed for processing ultra-long text sequences, supporting a context window of up to 4 million tokens. It excels at long-context understanding and instruction-following, maintaining strong performance on both long-context and standard benchmarks. This model is ideal for applications requiring extensive document analysis, summarization, or complex conversational AI over very large inputs.

Loading preview...