Overview

MegaBeam-Mistral-7B-300k is a 7 billion parameter language model developed by aws-prototyping, fine-tuned from Mistral-7B-Instruct-v0.2. Its primary distinguishing feature is its extended context window, supporting up to 320,000 tokens, a substantial increase from the base model's 32,000 tokens. This enhancement is achieved through modifications like adjusting the rope_theta parameter to 25e6.

Key Capabilities

Exceptional Long-Context Handling: Designed to process and reason over very long inputs, exceeding 300,000 tokens.
Strong Retrieval Performance: Demonstrates high accuracy in tasks requiring information retrieval from extensive documents, such as PassKey (100%) and Number retrieval (96.10%) on the InfiniteBench benchmark.
Deployment Flexibility: Can be deployed on a single AWS g5.48xlarge instance using serving frameworks like vLLM or SageMaker DJL, with configuration adjustments for KV-cache management.

Benchmarks

Evaluated on the InfiniteBench benchmark, which assesses models on super long contexts (100k+ tokens). MegaBeam-Mistral-7B-300k shows competitive performance, particularly in retrieval tasks, outperforming its base model and other Llama-3 variants in several categories. For instance, it achieves 100% on Retrieve.PassKey and 96.10% on Retrieve.Number, indicating strong capabilities in extracting specific information from noisy, long contexts.

Good for

Document Analysis: Ideal for tasks involving summarization, question answering, or information extraction from very large documents, reports, or codebases.
Extended Conversations: Suitable for chatbots or agents that need to maintain context over extremely long dialogue histories.
Research and Development: Useful for researchers and developers exploring the limits of long-context language models and building applications that leverage deep contextual understanding.

Overview

Overview

Key Capabilities

Benchmarks

Good for

Full Model Card (README)