mistralai/Mistral-Small-24B-Instruct-2501

5.0 based on 1 review
Warm
Public
24B
FP8
32768
Jan 28, 2025
License: apache-2.0
Hugging Face
Overview

Overview

Mistral-Small-24B-Instruct-2501, developed by Mistral AI, is a 24 billion parameter instruction-fine-tuned model designed to deliver state-of-the-art capabilities in the "small" LLM category. It is an instruction-tuned version of the Mistral-Small-24B-Base-2501 model and is released under an Apache 2.0 License, allowing broad commercial and non-commercial use. The model features a 32k context window and a 131k vocabulary Tekken tokenizer.

Key Capabilities

  • Multilingual Support: Proficient in dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
  • Agent-Centric Design: Excels in agentic tasks with native function calling and JSON outputting capabilities.
  • Advanced Reasoning: Demonstrates strong conversational and reasoning abilities.
  • System Prompt Adherence: Maintains robust adherence to and support for system prompts.

Performance Highlights

Internal human evaluations indicate Mistral-Small-24B-Instruct-2501 performs comparably or favorably against models like Gemma-2-27B and Qwen-2.5-32B on proprietary coding and generalist prompts. Public benchmarks show strong performance in:

  • Reasoning & Knowledge: Achieves 0.663 on mmlu_pro_5shot_cot_instruct and 0.453 on gpqa_main_cot_5shot_instruct.
  • Math & Coding: Scores 0.848 on humaneval_instruct_pass@1 and 0.706 on math_instruct.
  • Instruction Following: Records 8.35 on mtbench_dev and 52.27 on wildbench.

Ideal Use Cases

  • Fast Conversational Agents: Suitable for applications requiring quick responses.
  • Low Latency Function Calling: Optimized for efficient tool use and function execution.
  • Local Inference: Can be deployed on consumer-grade hardware (e.g., RTX 4090 or 32GB RAM MacBook when quantized), making it ideal for hobbyists and organizations with sensitive data requirements.
  • Subject Matter Experts: Can be further fine-tuned for specialized domain knowledge.