chutesai/Mistral-Small-3.1-24B-Instruct-2503

VISIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Mar 24, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

Mistral-Small-3.1-24B-Instruct-2503 is a 24 billion parameter instruction-finetuned model by Mistral AI, building upon Mistral Small 3 (2501). It integrates state-of-the-art vision understanding and extends long context capabilities up to 128k tokens, while maintaining strong text performance. This model excels in multimodal tasks, advanced reasoning, and agentic capabilities with native function calling and JSON outputting, making it suitable for local deployment and sensitive data handling.

Loading preview...

Overview

Mistral-Small-3.1-24B-Instruct-2503 is an instruction-finetuned model from Mistral AI, featuring 24 billion parameters. It significantly enhances its predecessor, Mistral Small 3 (2501), by incorporating state-of-the-art vision understanding and expanding long context capabilities to 128k tokens without compromising text performance. The model is designed to be "knowledge-dense" and can be deployed locally, fitting on a single RTX 4090 or a 32GB RAM MacBook when quantized.

Key Capabilities

  • Vision: Analyzes images and provides insights based on visual content alongside text.
  • Multilingual: Supports dozens of languages, including English, French, German, Japanese, Chinese, and Arabic.
  • Agent-Centric: Offers robust agentic capabilities with native function calling and JSON outputting.
  • Advanced Reasoning: Delivers strong conversational and reasoning performance.
  • Long Context: Features a 128k context window, with strong performance on LongBench v2 and RULER benchmarks.
  • System Prompt Adherence: Maintains strong adherence to system prompts.

Benchmark Highlights

The model demonstrates competitive performance across various benchmarks:

  • Text Evals: Achieves 80.62% on MMLU and 88.41% on HumanEval.
  • Vision Evals: Scores 64.00% on MMMU and 68.91% on Mathvista, outperforming several comparable models.
  • Multilingual Evals: Shows an average of 71.18% across European, East Asian, and Middle Eastern languages.

Good For

  • Fast-response conversational agents.
  • Low-latency function calling.
  • Subject matter experts via fine-tuning.
  • Local inference for hobbyists and organizations handling sensitive data.
  • Programming and math reasoning.
  • Long document understanding.
  • Visual understanding.