gradientai/Llama-3-8B-Instruct-Gradient-4194k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 4, 2024License:llama3Architecture:Transformer0.1K Warm

The gradientai/Llama-3-8B-Instruct-Gradient-4194k is an 8 billion parameter instruction-tuned Llama 3 model developed by Gradient. This model significantly extends the base Llama-3 8B's context length from 8K to an impressive 4194K tokens, achieved through progressive training and RoPE theta adjustments. It is optimized for long-context applications, demonstrating that state-of-the-art LLMs can operate on extended contexts with minimal additional training.

Loading preview...

Overview

This model, Llama-3 8B Instruct Gradient 4194K, is an 8 billion parameter instruction-tuned variant of Meta's Llama-3 8B. Developed by Gradient, its primary innovation is the dramatic extension of the context window from the base model's 8K tokens to 4194K tokens. This was achieved through a progressive training approach on increasing context lengths, utilizing NTK-aware interpolation for RoPE theta adjustments, and building on the EasyContext Blockwise RingAttention library for scalable training.

Key Capabilities

  • Massive Context Window: Processes and understands information across an exceptionally long context of up to 4194K tokens, enabling deep analysis of extensive documents or conversations.
  • Llama-3 Base Performance: Retains the strong performance characteristics of the Llama-3 8B Instruct model, which excels in general reasoning, knowledge retrieval, and instruction following.
  • Efficient Long-Context Training: Demonstrates a method for extending context with minimal additional training data (approximately 0.01% of Llama-3's original pre-training data).

Good For

  • Long-form document analysis: Summarizing, querying, or extracting information from very large texts, codebases, or datasets.
  • Complex conversational agents: Maintaining coherence and memory over extended dialogues or multi-turn interactions.
  • Applications requiring deep contextual understanding: Use cases where understanding relationships and dependencies across vast amounts of text is critical.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p