gradientai/Llama-3-70B-Instruct-Gradient-1048k
The gradientai/Llama-3-70B-Instruct-Gradient-1048k is a 70 billion parameter instruction-tuned language model developed by Gradient. It extends the context length of the base Meta Llama 3 70B Instruct model from 8k to over 1 million tokens, utilizing techniques like NTK-aware interpolation and progressive training. This model is specifically optimized for handling extremely long contexts, making it suitable for applications requiring extensive document analysis or prolonged conversational memory.
Loading preview...
Overview
This model, developed by Gradient, is an extended context version of the Meta Llama 3 70B Instruct model. Its primary innovation is the significant increase in context window from the original 8,192 tokens to over 1,048,000 tokens. This was achieved through techniques such as NTK-aware interpolation for RoPE theta adjustment and progressive training on increasing context lengths, similar to the Large World Model approach.
Key Capabilities
- Massive Context Window: Processes and understands information across extremely long sequences, exceeding 1 million tokens.
- Llama 3 70B Foundation: Inherits the strong performance and instruction-following capabilities of the base Llama 3 70B Instruct model.
- Efficient Training: Achieved long-context capabilities with minimal additional training data (less than 0.003% of Llama 3's original pre-training data).
Good For
- Applications requiring deep analysis of very long documents or codebases.
- Complex conversational agents needing extensive memory and context retention.
- Tasks involving summarization, question-answering, or generation over large bodies of text.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.