Overview

Mistral-Small-3.1-24B-Base-2503 is a 24 billion parameter base model from Mistral AI, enhancing the previous Mistral Small 3 with significant multimodal capabilities. It integrates state-of-the-art vision understanding and extends its context window to 128k tokens, all while maintaining robust text performance. This model serves as the base for the instruction-tuned Mistral-Small-3.1-24B-Instruct-2503.

Key Capabilities

Vision: Analyzes images and provides insights based on visual content, complementing its text understanding.
Extended Context: Features a 128k token context window, enabling processing of very long inputs.
Multilingual Support: Supports dozens of languages, including English, French, German, Japanese, Korean, Chinese, and Arabic.
Apache 2.0 License: Allows for broad commercial and non-commercial use and modification.
Strong Benchmarks: Achieves competitive results in pretrain evaluations, scoring 81.01% on MMLU (5-shot) and 59.27% on MMMU, outperforming Gemma 3 27B PT in several key metrics.

Usage and Recommendations

This is a pretrained-only checkpoint, meaning it is not ready for instruction-following out-of-the-box. For instruction-tuned applications, users should refer to the Mistral-Small-3.1-24B-Instruct-2503 model. Mistral AI recommends using this base model with the vLLM library for optimal performance, particularly with the vLLM nightly build to ensure compatibility with its tokenizer and multimodal features.