mattshumer/Llama-3-8B-16K

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 23, 2024Architecture:Transformer0.1K Cold

mattshumer/Llama-3-8B-16K is an 8 billion parameter Llama 3 base model extended to a 16K token context length. This model was fine-tuned for five hours using the LongAlpaca-16k-length dataset, making it suitable for applications requiring processing longer sequences of text. It leverages an adjusted rope_theta of 1,000,000.0 to enhance its long-context capabilities.

Loading preview...

mattshumer/Llama-3-8B-16K Overview

This model is an extended context version of the Llama 3 8B base model, developed by mattshumer. While the original Llama 3 8B typically has a shorter context window, this variant has been specifically trained to support a 16,000 token context length.

Key Capabilities

  • Extended Context Window: Processes significantly longer input sequences compared to the standard Llama 3 8B, enabling more comprehensive understanding and generation for lengthy documents or conversations.
  • Llama 3 Architecture: Benefits from the robust foundational architecture of the Llama 3 series.
  • Training Details: The extension was achieved through five hours of training on 8x A6000 GPUs, utilizing the Yukang/LongAlpaca-16k-length dataset. The rope_theta parameter was adjusted to 1,000,000.0 to facilitate this long-context capability.

Good For

  • Long Document Analysis: Ideal for tasks such as summarizing lengthy articles, legal documents, or research papers.
  • Extended Conversational AI: Suitable for chatbots or virtual assistants that need to maintain context over very long dialogues.
  • Code Generation and Analysis: Can handle larger codebases or complex programming tasks requiring extensive context.
  • Research and Development: Provides a strong base for further fine-tuning on specific long-context applications.