Name: MaziyarPanahi/Llama-3-8B-Instruct-64k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MaziyarPanahi

MaziyarPanahi/Llama-3-8B-Instruct-64k Overview

This model is an instruction-tuned variant of the Llama 3 8B architecture, distinguished by its significantly extended context window. Developed by MaziyarPanahi, it builds upon work by @winglian, specifically leveraging the PoSE (Position-aware Scaled Embedding) technique to expand the context length from Llama's native 8k to 64k tokens.

Key Capabilities & Features

Extended Context Window: Utilizes PoSE to achieve a 64k token context length, enabling the model to process and generate much longer texts than standard Llama 3 8B models.
Continued Pre-training: The model underwent continued pre-training on 300 million tokens from the RedPajama V1 dataset, with a focus on data segments between 6k-8k tokens, to optimize performance with the extended context.
High rope_theta: The rope_theta parameter was set to 500000.0 during PoSE extension and further adjusted to 2M after continued pre-training, indicating potential for even greater context extension.
Instruction-tuned: Designed to follow instructions effectively, making it suitable for various conversational and task-oriented applications.

Good For

Applications requiring deep understanding and generation of long documents, articles, or codebases.
Summarization, question-answering, and analysis of extensive textual data.
Conversational AI where maintaining context over long dialogues is crucial.
Tasks benefiting from a large memory of past interactions or information.

Overview

MaziyarPanahi/Llama-3-8B-Instruct-64k Overview

Key Capabilities & Features

Good For

Full Model Card (README)