nicoboss/Mistral-Small-3.2-24B-Instruct-2506-llamacppfixed

VISIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Jun 20, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Mistral-Small-3.2-24B-Instruct-2506 is an instruction-tuned causal language model developed by Mistral AI, serving as a minor update to Mistral-Small-3.1-24B-Instruct-2503. This model significantly improves instruction following, reduces repetitive generations, and features a more robust function calling template. It is designed for general assistant tasks, excelling in scenarios requiring precise instruction adherence and reliable tool use, including vision reasoning.

Loading preview...

Overview

Mistral-Small-3.2-24B-Instruct-2506 is an updated instruction-tuned model from Mistral AI, building upon its predecessor, Mistral-Small-3.1-24B-Instruct-2503. This version focuses on enhancing core functionalities crucial for reliable AI applications.

Key Improvements

  • Instruction Following: Demonstrates improved accuracy in following precise instructions, with internal benchmarks showing an increase from 82.75% to 84.78% on instruction following tasks.
  • Repetition Errors: Significantly reduces infinite generations and repetitive answers, cutting internal infinite generation rates by nearly half (from 2.11% to 1.29%).
  • Function Calling: Features a more robust function calling template, making it more reliable for tool-use scenarios.
  • Vision Capabilities: Maintains and slightly improves upon the vision reasoning capabilities of its predecessor, as evidenced by benchmarks like ChartQA and DocVQA.

Performance Highlights

While largely matching or slightly improving upon Mistral-Small-3.1 in most categories, notable benchmark improvements include:

  • Instruction Following: Wildbench v2 score increased from 55.6% to 65.33% and Arena Hard v2 from 19.56% to 43.1%.
  • STEM Tasks: Achieved improvements in MMLU Pro (66.76% to 69.06%), MBPP Plus - Pass@5 (74.63% to 78.33%), HumanEval Plus - Pass@5 (88.99% to 92.90%), and SimpleQA (10.43% to 12.10%).

Recommended Usage

This model is recommended for use with vLLM (version 0.9.1 or higher) for optimal performance, particularly for its robust function calling and vision reasoning features. A relatively low temperature (e.g., 0.15) is suggested for best results, along with a system prompt for tailoring its behavior.