gradientai/Llama-3-70B-Instruct-Gradient-262k

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:May 3, 2024License:llama3Architecture:Transformer0.1K Warm

The gradientai/Llama-3-70B-Instruct-Gradient-262k model, developed by Gradient, is an instruction-tuned Llama 3 70B model with an extended context length of over 262,000 tokens, significantly surpassing the base model's 8k context. This extension is achieved through NTK-aware interpolation and progressive training on augmented data, making it highly suitable for applications requiring deep understanding and processing of very long documents and conversations. It demonstrates that state-of-the-art LLMs can adapt to long contexts with minimal additional training.

Loading preview...

Llama-3 70B Instruct Gradient 262K: Extended Long Context

This model, developed by Gradient, is an instruction-tuned variant of Meta's Llama 3 70B, specifically engineered to handle significantly longer contexts. While the base Llama 3 70B model has an 8k token context window, this Gradient version extends it to over 262,000 tokens.

Key Capabilities

  • Massive Context Window: Processes and understands information across extremely long documents and conversations, exceeding 262k tokens.
  • Efficient Context Extension: Achieves long context capabilities with minimal additional training (less than 0.002% of Llama-3's original pre-training data) by adjusting RoPE theta and using NTK-aware interpolation.
  • Progressive Training: Utilizes a progressive training approach on increasing context lengths, inspired by methods like Large World Model, to effectively scale context handling.
  • Robust Infrastructure: Built on the EasyContext Blockwise RingAttention library, leveraging a custom network topology for efficient training on large GPU clusters.

Good For

  • Long Document Analysis: Ideal for tasks requiring comprehension and generation based on extensive texts, such as legal documents, research papers, or large codebases.
  • Complex Conversational AI: Suitable for chatbots and agents that need to maintain coherence and context over very long dialogues.
  • Information Retrieval and Synthesis: Excels in scenarios where information needs to be extracted and synthesized from vast amounts of data within a single context window.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p