Overview
Mistral-Small-24B-Base-2501: A Powerful and Efficient Base Model
Developed by Mistral AI, Mistral-Small-24B-Base-2501 is a 24 billion parameter base model that underpins the instruction-tuned Mistral Small 3. This model is notable for its "knowledge-dense" architecture, offering state-of-the-art capabilities in the sub-70B LLM category. It is designed for efficient deployment, capable of running locally on consumer-grade hardware like an RTX 4090 or a 32GB RAM MacBook after quantization.
Key Features & Capabilities
- Multilingual Support: Handles dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
- Agent-Centric Design: Optimized for agentic tasks with native function calling and JSON output capabilities.
- Advanced Reasoning: Delivers strong conversational and reasoning performance.
- Extensive Context Window: Features a 32k token context window for processing longer inputs.
- System Prompt Adherence: Maintains robust adherence to and support for system prompts.
- Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.
- Apache 2.0 License: Allows for broad commercial and non-commercial use and modification.
Benchmarks
Human evaluations show Mistral Small 3 (derived from this base) performing competitively against models like Gemma-2-27B and Qwen-2.5-32B, and holding its own against larger models like Llama-3.3-70B and GPT-4o-mini in categories such as reasoning, knowledge, math, coding, and instruction following.
Ideal Use Cases
- Fast Response Conversational Agents: Its efficiency makes it suitable for interactive applications.
- Low Latency Function Calling: Excellent for scenarios requiring quick tool use.
- Subject Matter Experts: Can be fine-tuned for specialized domain knowledge.
- Local Inference: Perfect for hobbyists and organizations handling sensitive data who require on-device processing.