MaziyarPanahi/Llama-3-8B-Instruct-64k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 25, 2024License:llama3Architecture:Transformer0.0K Warm

MaziyarPanahi/Llama-3-8B-Instruct-64k is an 8 billion parameter instruction-tuned Llama 3 model, extended to a 64k token context length using the PoSE (Position-aware Scaled Embedding) method. This model was further pre-trained on 300M tokens from the RedPajama V1 dataset, specifically focusing on data between 6k-8k tokens. It is designed for applications requiring processing and generating long sequences of text, leveraging its significantly expanded context window.

Loading preview...

MaziyarPanahi/Llama-3-8B-Instruct-64k Overview

This model is an instruction-tuned variant of the Llama 3 8B architecture, distinguished by its significantly extended context window. Developed by MaziyarPanahi, it builds upon work by @winglian, specifically leveraging the PoSE (Position-aware Scaled Embedding) technique to expand the context length from Llama's native 8k to 64k tokens.

Key Capabilities & Features

  • Extended Context Window: Utilizes PoSE to achieve a 64k token context length, enabling the model to process and generate much longer texts than standard Llama 3 8B models.
  • Continued Pre-training: The model underwent continued pre-training on 300 million tokens from the RedPajama V1 dataset, with a focus on data segments between 6k-8k tokens, to optimize performance with the extended context.
  • High rope_theta: The rope_theta parameter was set to 500000.0 during PoSE extension and further adjusted to 2M after continued pre-training, indicating potential for even greater context extension.
  • Instruction-tuned: Designed to follow instructions effectively, making it suitable for various conversational and task-oriented applications.

Good For

  • Applications requiring deep understanding and generation of long documents, articles, or codebases.
  • Summarization, question-answering, and analysis of extensive textual data.
  • Conversational AI where maintaining context over long dialogues is crucial.
  • Tasks benefiting from a large memory of past interactions or information.