Mistral-Small-3.1-24B-Base-2503 Overview

Mistral-Small-3.1-24B-Base-2503 is a 24 billion parameter base model from Mistral AI, enhancing the previous Mistral Small 3. It introduces state-of-the-art vision understanding and significantly expands its context window to 128k tokens without compromising its text processing capabilities. This model is the foundational pre-trained version for the instruction-tuned Mistral-Small-3.1-24B-Instruct-2503.

Key Capabilities

Multimodal Vision: Processes and analyzes images in addition to text, providing insights based on visual content.
Extended Context Window: Features a substantial 128k token context window, enabling the processing of longer inputs and more complex information.
Multilingual Support: Capable of handling dozens of languages, including English, French, German, Japanese, Korean, Chinese, and many others.
Apache 2.0 License: Offers an open license for both commercial and non-commercial use and modification.
Strong Base Performance: Achieves competitive benchmark results in pre-training evaluations, including 81.01% on MMLU (5-shot) and 59.27% on MMMU, outperforming Gemma 3 27B PT in several key metrics.

Usage Notes

This is a pre-trained base model, meaning it is not instruction-tuned out-of-the-box. For production-ready instruction following, users should consider the instruction-tuned variant. Mistral AI recommends using this model with the vLLM library for optimized inference.