unsloth/Mistral-Small-3.1-24B-Instruct-2503
Hugging Face
VISIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Mar 18, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Mistral-Small-3.1-24B-Instruct-2503 is a 24 billion parameter instruction-finetuned model by Mistral AI, building upon Mistral Small 3. It features state-of-the-art vision understanding and an extended 128k token context window, alongside strong text performance. This model is optimized for fast-response conversational agents, low-latency function calling, and local inference, supporting a wide range of multilingual and advanced reasoning tasks.

Loading preview...

Overview

Mistral-Small-3.1-24B-Instruct-2503, developed by Mistral AI, is a 24 billion parameter instruction-finetuned model that significantly enhances its predecessor, Mistral Small 3. It introduces state-of-the-art vision understanding and expands its long context capabilities up to 128k tokens without compromising text performance. The model is designed to be "knowledge-dense" and can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.

Key Capabilities

  • Vision: Analyzes images and provides insights based on visual content in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, German, Japanese, Korean, Chinese, and Arabic.
  • Agent-Centric: Offers strong agentic capabilities with native function calling and JSON outputting.
  • Advanced Reasoning: Provides state-of-the-art conversational and reasoning abilities.
  • Extended Context: Features a 128k context window for long document understanding.
  • Apache 2.0 License: Allows for both commercial and non-commercial use and modification.

Benchmark Highlights

The model demonstrates competitive performance across various benchmarks:

  • Text: Achieves 80.62% on MMLU, 88.41% on HumanEval, and 69.30% on MATH.
  • Vision: Scores 64.00% on MMMU and 68.91% on Mathvista, outperforming several comparable models.
  • Multilingual: Shows strong average performance at 71.18% across European, East Asian, and Middle Eastern languages.
  • Long Context: Achieves 93.96% on RULER 32K, indicating robust long-context understanding.

Good For

  • Fast-response conversational agents.
  • Low-latency function calling.
  • Local inference for hobbyists and organizations with sensitive data.
  • Programming and math reasoning.
  • Long document understanding and visual analysis.