RedHatAI/Mistral-Small-24B-Instruct-2501

TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:May 9, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

RedHatAI/Mistral-Small-24B-Instruct-2501 is a 24 billion parameter instruction-tuned causal language model developed by Mistral AI. This model is optimized for agentic capabilities, including native function calling and JSON outputting, and features advanced reasoning. It supports a 32K token context window and is multilingual, making it suitable for fast response conversational agents and low-latency function calling.

Loading preview...

Overview

RedHatAI/Mistral-Small-24B-Instruct-2501 is a 24 billion parameter instruction-tuned model from Mistral AI, designed to offer capabilities comparable to larger models while remaining efficient. It is an instruction-fine-tuned version of the Mistral-Small-24B-Base-2501 model and is validated for deployment on Red Hat AI platforms. The model is notably "knowledge-dense" and can fit on a single RTX 4090 or a 32GB RAM MacBook when quantized, making it ideal for local inference.

Key Capabilities

  • Multilingual Support: Handles dozens of languages including English, French, German, Spanish, Italian, Chinese, Japanese, and Korean.
  • Agent-Centric Design: Excels in agentic tasks with native function calling and reliable JSON outputting.
  • Advanced Reasoning: Provides strong conversational and reasoning abilities.
  • Extensive Context Window: Features a 32K token context window for handling longer interactions.
  • System Prompt Adherence: Maintains robust adherence to system prompts.
  • Apache 2.0 License: Allows for both commercial and non-commercial use and modification.

Performance Highlights

Human evaluations indicate that Mistral-Small-24B-Instruct-2501 performs competitively against models like Gemma-2-27B and Qwen-2.5-32B, and even against larger models like Llama-3.3-70B and GPT-4o-mini in certain categories. Public benchmarks show strong results in Reasoning & Knowledge (e.g., 0.663 on MMLU), Math & Coding (e.g., 0.848 on HumanEval), and Instruction Following (e.g., 8.35 on MTBench).

Good For

  • Fast Response Conversational Agents: Its efficiency and reasoning capabilities make it suitable for interactive applications.
  • Low Latency Function Calling: Optimized for scenarios requiring quick execution of functions.
  • Subject Matter Experts: Can be fine-tuned for specialized domain knowledge.
  • Local Inference: Ideal for hobbyists and organizations handling sensitive data due to its deployability on consumer-grade hardware.