Overview
Overview
MistralLite is a 7 billion parameter language model developed by AWS Contributors, fine-tuned from Mistral-7B-v0.1. Its primary innovation lies in its significantly enhanced capability to process and understand long contexts, supporting up to 32K tokens. This is achieved through an adapted Rotary Embedding (rope_theta = 1000000) and a larger sliding window size (16384), which allows it to maintain performance over extended inputs.
Key Capabilities & Differentiators
- Extended Context Handling: Unlike its base model, MistralLite is specifically fine-tuned on long contexts up to 16K tokens, demonstrating superior performance in tasks requiring deep understanding of lengthy documents.
- Improved Long Context Performance: Benchmarks show MistralLite achieving 100% accuracy on Topic Retrieval and Pass Key Retrieval for input lengths up to 13780 and 10197 tokens respectively, where Mistral-7B-Instruct-v0.1 often drops to 0-50%.
- Enhanced Question Answering: It significantly boosts accuracy on Question Answering with Long Input Texts, scoring 64.4% overall and 56.2% on hard subsets, compared to 44.3% and 39.7% for the base model.
- Resource-Efficient Deployment: Designed for deployment on single AWS
g5.2xinstances using SageMaker Huggingface Text Generation Inference (TGI), making it suitable for high-performance, resource-constrained environments.
Ideal Use Cases
- Long Context Retrieval and Answering: Excels in scenarios requiring information extraction and question answering from extensive documents.
- Summarization: Highly effective for summarizing long texts due to its deep contextual understanding.
- Topic and Line Retrieval: Proven to accurately identify topics and specific lines within very long inputs.
Important Notes
- Requires a specific prompt template:
<|prompter|>...</s><|assistant|>. - Supports various serving frameworks including TGI, vLLM, and HuggingFace transformers.