MaziyarPanahi/Llama-3-8B-Instruct-64k

Warm
Public
8B
FP8
8192
Apr 25, 2024
License: llama3
Hugging Face
Overview

MaziyarPanahi/Llama-3-8B-Instruct-64k Overview

This model is an instruction-tuned variant of the Llama 3 8B architecture, distinguished by its significantly extended context window. Developed by MaziyarPanahi, it builds upon work by @winglian, specifically leveraging the PoSE (Position-aware Scaled Embedding) technique to expand the context length from Llama's native 8k to 64k tokens.

Key Capabilities & Features

  • Extended Context Window: Utilizes PoSE to achieve a 64k token context length, enabling the model to process and generate much longer texts than standard Llama 3 8B models.
  • Continued Pre-training: The model underwent continued pre-training on 300 million tokens from the RedPajama V1 dataset, with a focus on data segments between 6k-8k tokens, to optimize performance with the extended context.
  • High rope_theta: The rope_theta parameter was set to 500000.0 during PoSE extension and further adjusted to 2M after continued pre-training, indicating potential for even greater context extension.
  • Instruction-tuned: Designed to follow instructions effectively, making it suitable for various conversational and task-oriented applications.

Good For

  • Applications requiring deep understanding and generation of long documents, articles, or codebases.
  • Summarization, question-answering, and analysis of extensive textual data.
  • Conversational AI where maintaining context over long dialogues is crucial.
  • Tasks benefiting from a large memory of past interactions or information.