aws-prototyping/MegaBeam-Mistral-7B-300k

Warm
Public
7B
FP8
8192
License: apache-2.0
Hugging Face
Overview

Overview

MegaBeam-Mistral-7B-300k is a 7 billion parameter language model developed by aws-prototyping, fine-tuned from Mistral-7B-Instruct-v0.2. Its primary distinguishing feature is its extended context window, supporting up to 320,000 tokens, a substantial increase from the base model's 32,000 tokens. This enhancement is achieved through modifications like adjusting the rope_theta parameter to 25e6.

Key Capabilities

  • Exceptional Long-Context Handling: Designed to process and reason over very long inputs, exceeding 300,000 tokens.
  • Strong Retrieval Performance: Demonstrates high accuracy in tasks requiring information retrieval from extensive documents, such as PassKey (100%) and Number retrieval (96.10%) on the InfiniteBench benchmark.
  • Deployment Flexibility: Can be deployed on a single AWS g5.48xlarge instance using serving frameworks like vLLM or SageMaker DJL, with configuration adjustments for KV-cache management.

Benchmarks

Evaluated on the InfiniteBench benchmark, which assesses models on super long contexts (100k+ tokens). MegaBeam-Mistral-7B-300k shows competitive performance, particularly in retrieval tasks, outperforming its base model and other Llama-3 variants in several categories. For instance, it achieves 100% on Retrieve.PassKey and 96.10% on Retrieve.Number, indicating strong capabilities in extracting specific information from noisy, long contexts.

Good for

  • Document Analysis: Ideal for tasks involving summarization, question answering, or information extraction from very large documents, reports, or codebases.
  • Extended Conversations: Suitable for chatbots or agents that need to maintain context over extremely long dialogue histories.
  • Research and Development: Useful for researchers and developers exploring the limits of long-context language models and building applications that leverage deep contextual understanding.