Overview

MistralLite is a 7 billion parameter language model developed by AWS Contributors, fine-tuned from Mistral-7B-v0.1. Its primary innovation lies in its significantly enhanced capability to process and understand long contexts, supporting up to 32K tokens. This is achieved through an adapted Rotary Embedding (rope_theta = 1000000) and a larger sliding window size (16384), which allows it to maintain performance over extended inputs.

Key Capabilities & Differentiators

Extended Context Handling: Unlike its base model, MistralLite is specifically fine-tuned on long contexts up to 16K tokens, demonstrating superior performance in tasks requiring deep understanding of lengthy documents.
Improved Long Context Performance: Benchmarks show MistralLite achieving 100% accuracy on Topic Retrieval and Pass Key Retrieval for input lengths up to 13780 and 10197 tokens respectively, where Mistral-7B-Instruct-v0.1 often drops to 0-50%.
Enhanced Question Answering: It significantly boosts accuracy on Question Answering with Long Input Texts, scoring 64.4% overall and 56.2% on hard subsets, compared to 44.3% and 39.7% for the base model.
Resource-Efficient Deployment: Designed for deployment on single AWS g5.2x instances using SageMaker Huggingface Text Generation Inference (TGI), making it suitable for high-performance, resource-constrained environments.

Ideal Use Cases

Long Context Retrieval and Answering: Excels in scenarios requiring information extraction and question answering from extensive documents.
Summarization: Highly effective for summarizing long texts due to its deep contextual understanding.
Topic and Line Retrieval: Proven to accurately identify topics and specific lines within very long inputs.

Important Notes

Requires a specific prompt template: <|prompter|>...</s><|assistant|>.
Supports various serving frameworks including TGI, vLLM, and HuggingFace transformers.