Overview
MaziyarPanahi/Llama-3-8B-Instruct-64k Overview
This model is an instruction-tuned variant of the Llama 3 8B architecture, distinguished by its significantly extended context window. Developed by MaziyarPanahi, it builds upon work by @winglian, specifically leveraging the PoSE (Position-aware Scaled Embedding) technique to expand the context length from Llama's native 8k to 64k tokens.
Key Capabilities & Features
- Extended Context Window: Utilizes PoSE to achieve a 64k token context length, enabling the model to process and generate much longer texts than standard Llama 3 8B models.
- Continued Pre-training: The model underwent continued pre-training on 300 million tokens from the RedPajama V1 dataset, with a focus on data segments between 6k-8k tokens, to optimize performance with the extended context.
- High
rope_theta: Therope_thetaparameter was set to 500000.0 during PoSE extension and further adjusted to 2M after continued pre-training, indicating potential for even greater context extension. - Instruction-tuned: Designed to follow instructions effectively, making it suitable for various conversational and task-oriented applications.
Good For
- Applications requiring deep understanding and generation of long documents, articles, or codebases.
- Summarization, question-answering, and analysis of extensive textual data.
- Conversational AI where maintaining context over long dialogues is crucial.
- Tasks benefiting from a large memory of past interactions or information.