aws-prototyping/MegaBeam-Mistral-7B-300k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:May 13, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

MegaBeam-Mistral-7B-300k is a 7 billion parameter language model developed by aws-prototyping, fine-tuned from Mistral-7B-Instruct-v0.2. This model is specifically engineered to support exceptionally long input contexts, up to 320,000 tokens, significantly extending the context window of its base model. It excels in long-context understanding and retrieval tasks, making it suitable for applications requiring processing of extensive documents or conversations.

Loading preview...

Overview

MegaBeam-Mistral-7B-300k is a 7 billion parameter language model developed by aws-prototyping, fine-tuned from Mistral-7B-Instruct-v0.2. Its primary distinguishing feature is its extended context window, supporting up to 320,000 tokens, a substantial increase from the base model's 32,000 tokens. This enhancement is achieved through modifications like adjusting the rope_theta parameter to 25e6.

Key Capabilities

  • Exceptional Long-Context Handling: Designed to process and reason over very long inputs, exceeding 300,000 tokens.
  • Strong Retrieval Performance: Demonstrates high accuracy in tasks requiring information retrieval from extensive documents, such as PassKey (100%) and Number retrieval (96.10%) on the InfiniteBench benchmark.
  • Deployment Flexibility: Can be deployed on a single AWS g5.48xlarge instance using serving frameworks like vLLM or SageMaker DJL, with configuration adjustments for KV-cache management.

Benchmarks

Evaluated on the InfiniteBench benchmark, which assesses models on super long contexts (100k+ tokens). MegaBeam-Mistral-7B-300k shows competitive performance, particularly in retrieval tasks, outperforming its base model and other Llama-3 variants in several categories. For instance, it achieves 100% on Retrieve.PassKey and 96.10% on Retrieve.Number, indicating strong capabilities in extracting specific information from noisy, long contexts.

Good for

  • Document Analysis: Ideal for tasks involving summarization, question answering, or information extraction from very large documents, reports, or codebases.
  • Extended Conversations: Suitable for chatbots or agents that need to maintain context over extremely long dialogue histories.
  • Research and Development: Useful for researchers and developers exploring the limits of long-context language models and building applications that leverage deep contextual understanding.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p