Overview

Mistral-Small-3.1-24B-Instruct-2503 is an instruction-finetuned model from Mistral AI, featuring 24 billion parameters. It significantly enhances its predecessor, Mistral Small 3 (2501), by incorporating state-of-the-art vision understanding and expanding long context capabilities to 128k tokens without compromising text performance. The model is designed to be "knowledge-dense" and can be deployed locally, fitting on a single RTX 4090 or a 32GB RAM MacBook when quantized.

Key Capabilities

Vision: Analyzes images and provides insights based on visual content alongside text.
Multilingual: Supports dozens of languages, including English, French, German, Japanese, Chinese, and Arabic.
Agent-Centric: Offers robust agentic capabilities with native function calling and JSON outputting.
Advanced Reasoning: Delivers strong conversational and reasoning performance.
Long Context: Features a 128k context window, with strong performance on LongBench v2 and RULER benchmarks.
System Prompt Adherence: Maintains strong adherence to system prompts.

Benchmark Highlights

The model demonstrates competitive performance across various benchmarks:

Text Evals: Achieves 80.62% on MMLU and 88.41% on HumanEval.
Vision Evals: Scores 64.00% on MMMU and 68.91% on Mathvista, outperforming several comparable models.
Multilingual Evals: Shows an average of 71.18% across European, East Asian, and Middle Eastern languages.

Good For

Fast-response conversational agents.
Low-latency function calling.
Subject matter experts via fine-tuning.
Local inference for hobbyists and organizations handling sensitive data.
Programming and math reasoning.
Long document understanding.
Visual understanding.