chutesai/Mistral-Small-3.2-24B-Instruct-2506

VISIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Jun 21, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

Mistral-Small-3.2-24B-Instruct-2506 is a 24 billion parameter instruction-tuned language model developed by Mistral AI, building upon the Mistral-Small-3.1-24B-Instruct-2503 series. This model features improved instruction following, reduced repetition errors, and a more robust function calling template. It also supports multimodal inputs, including vision, and is optimized for general instruction-following tasks with a 32K context length.

Loading preview...

Overview

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24 billion parameter instruction-tuned model from Mistral AI, enhancing its predecessor, Mistral-Small-3.1-24B-Instruct-2503. This iteration focuses on refining core capabilities crucial for reliable AI applications.

Key Improvements & Capabilities

  • Enhanced Instruction Following: Demonstrates better adherence to precise instructions, with significant gains on benchmarks like Wildbench v2 (65.33%) and Arena Hard v2 (43.1%).
  • Reduced Repetition Errors: Significantly decreases infinite generations and repetitive outputs, showing a 2x reduction on challenging prompts.
  • Robust Function Calling: Features an improved and more reliable function calling template, facilitating better integration with tools and agents.
  • Multimodal Support: Retains vision capabilities, allowing for reasoning over image inputs, as demonstrated in examples involving visual scenarios.
  • General Performance: Maintains or slightly improves performance across various categories, including STEM benchmarks like MMLU Pro (69.06%), MBPP Plus - Pass@5 (78.33%), and HumanEval Plus - Pass@5 (92.90%).

Recommended Usage

This model is well-suited for applications requiring precise instruction adherence, reliable function calling, and multimodal understanding. It is recommended to use a low temperature (e.g., 0.15) and provide a system prompt for optimal performance. The model can be deployed efficiently using vLLM (recommended) or Transformers frameworks.