mistralai/Mistral-Small-3.1-24B-Base-2503
Mistral-Small-3.1-24B-Base-2503 is a 24 billion parameter base model developed by Mistral AI, building upon Mistral Small 3. It integrates state-of-the-art vision understanding and extends its long context capabilities up to 128k tokens, while maintaining strong text performance. This multilingual model supports dozens of languages and is designed for advanced text and vision tasks, serving as the base for instruction-tuned variants.
Mistral-Small-3.1-24B-Base-2503 Overview
Mistral-Small-3.1-24B-Base-2503 is a 24 billion parameter base model from Mistral AI, enhancing the previous Mistral Small 3. It introduces state-of-the-art vision understanding and significantly expands its context window to 128k tokens without compromising its text processing capabilities. This model is the foundational pre-trained version for the instruction-tuned Mistral-Small-3.1-24B-Instruct-2503.
Key Capabilities
- Multimodal Vision: Processes and analyzes images in addition to text, providing insights based on visual content.
- Extended Context Window: Features a substantial 128k token context window, enabling the processing of longer inputs and more complex information.
- Multilingual Support: Capable of handling dozens of languages, including English, French, German, Japanese, Korean, Chinese, and many others.
- Apache 2.0 License: Offers an open license for both commercial and non-commercial use and modification.
- Strong Base Performance: Achieves competitive benchmark results in pre-training evaluations, including 81.01% on MMLU (5-shot) and 59.27% on MMMU, outperforming Gemma 3 27B PT in several key metrics.
Usage Notes
This is a pre-trained base model, meaning it is not instruction-tuned out-of-the-box. For production-ready instruction following, users should consider the instruction-tuned variant. Mistral AI recommends using this model with the vLLM library for optimized inference.