mistralai/Mistral-Small-24B-Instruct-2501
Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Jan 28, 2025License:apache-2.0Architecture:Transformer0.9K Open Weights Warm

Mistral-Small-24B-Instruct-2501 is a 24 billion parameter instruction-fine-tuned large language model developed by Mistral AI. It offers state-of-the-art conversational and reasoning capabilities, comparable to larger models, and features native function calling and JSON outputting. This model is designed for fast response conversational agents, low-latency function calling, and local inference, fitting on a single RTX 4090 or a 32GB RAM MacBook when quantized.

Loading preview...

Overview

Mistral-Small-24B-Instruct-2501, developed by Mistral AI, is a 24 billion parameter instruction-fine-tuned model designed to deliver state-of-the-art capabilities in the "small" LLM category. It is an instruction-tuned version of the Mistral-Small-24B-Base-2501 model and is released under an Apache 2.0 License, allowing broad commercial and non-commercial use. The model features a 32k context window and a 131k vocabulary Tekken tokenizer.

Key Capabilities

  • Multilingual Support: Proficient in dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
  • Agent-Centric Design: Excels in agentic tasks with native function calling and JSON outputting capabilities.
  • Advanced Reasoning: Demonstrates strong conversational and reasoning abilities.
  • System Prompt Adherence: Maintains robust adherence to and support for system prompts.

Performance Highlights

Internal human evaluations indicate Mistral-Small-24B-Instruct-2501 performs comparably or favorably against models like Gemma-2-27B and Qwen-2.5-32B on proprietary coding and generalist prompts. Public benchmarks show strong performance in:

  • Reasoning & Knowledge: Achieves 0.663 on mmlu_pro_5shot_cot_instruct and 0.453 on gpqa_main_cot_5shot_instruct.
  • Math & Coding: Scores 0.848 on humaneval_instruct_pass@1 and 0.706 on math_instruct.
  • Instruction Following: Records 8.35 on mtbench_dev and 52.27 on wildbench.

Ideal Use Cases

  • Fast Conversational Agents: Suitable for applications requiring quick responses.
  • Low Latency Function Calling: Optimized for efficient tool use and function execution.
  • Local Inference: Can be deployed on consumer-grade hardware (e.g., RTX 4090 or 32GB RAM MacBook when quantized), making it ideal for hobbyists and organizations with sensitive data requirements.
  • Subject Matter Experts: Can be further fine-tuned for specialized domain knowledge.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p