Mistral-Small-3.1-24B-Instruct-2503 is a 24 billion parameter instruction-tuned model from Mistral AI, building on Mistral Small 3. It features state-of-the-art vision understanding and an enhanced 128k token context window, while maintaining strong text performance. This model excels in both text and vision tasks, offering advanced reasoning, multilingual support, and agentic capabilities with native function calling and JSON output. It is optimized for fast-response conversational agents, local inference, programming, math reasoning, and long document understanding.
Loading preview...
Mistral-Small-3.1-24B-Instruct-2503 Overview
Mistral-Small-3.1-24B-Instruct-2503 is a 24 billion parameter instruction-finetuned model developed by Mistral AI. It significantly enhances its predecessor by integrating state-of-the-art vision understanding and extending its long context capabilities up to 128k tokens without compromising text performance. The model is designed to be exceptionally "knowledge-dense" and can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.
Key Capabilities
- Vision: Analyzes images and provides insights based on visual content alongside text.
- Multilingual: Supports dozens of languages including English, French, German, Japanese, Chinese, and Arabic.
- Agent-Centric: Features best-in-class agentic capabilities with native function calling and JSON outputting.
- Advanced Reasoning: Offers state-of-the-art conversational and reasoning abilities.
- Extended Context: Utilizes a 128k context window for long document understanding.
- System Prompt Adherence: Maintains strong adherence and support for system prompts.
Performance Highlights
In instruction evaluations, Mistral-Small-3.1-24B-Instruct-2503 demonstrates competitive performance across various benchmarks. For text tasks, it achieves 80.62% on MMLU and 88.41% on HumanEval. In vision tasks, it scores 64.00% on MMMU and 94.08% on DocVQA. For multilingual understanding, it averages 71.18% across different language groups. Its long context performance is notable with 93.96% on RULER 32K.
Good For
- Fast-response conversational agents.
- Low-latency function calling and tool use.
- Local inference for sensitive data or hobbyist projects.
- Programming and mathematical reasoning tasks.
- Long document understanding and visual analysis.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.