Name: nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Model Overview

nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct is an 8 billion parameter language model from NVIDIA, part of the Nemotron-UltraLong series. This model is built on the Llama-3.1 architecture and is distinguished by its exceptional long-context processing capabilities, supporting a maximum context window of 2 million tokens. It achieves this through a systematic training recipe involving efficient continued pretraining and instruction tuning, which enhances long-context understanding and instruction-following without sacrificing general performance.

Key Capabilities

Ultra-Long Context Processing: Designed to handle up to 2 million tokens, enabling deep contextual analysis over very long documents or conversations.
Strong Instruction Following: Enhanced through instruction tuning on diverse datasets, including general, mathematics, and code domains.
Competitive Performance: Maintains strong results on standard benchmarks (MMLU, MATH, GSM-8K, HumanEval) while excelling in long-context specific evaluations (RULER, LV-Eval, InfiniteBench).
Llama-3.1 Base: Benefits from the robust foundation of the Llama-3.1-8B-Instruct model.

Good For

Applications requiring analysis or generation over extremely long texts, such as legal documents, research papers, or extensive codebases.
Complex instruction-following tasks where context length is a critical factor.
Conversational AI systems that need to maintain coherence and context over prolonged interactions.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)