unsloth/Mistral-Small-24B-Base-2501

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Jan 30, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Mistral-Small-24B-Base-2501 is a 24 billion parameter base language model developed by Mistral AI, serving as the foundation for the instruction-tuned Mistral Small 3. This model is designed to be exceptionally "knowledge-dense" and capable of local deployment, fitting on a single RTX 4090 or a 32GB RAM MacBook once quantized. It features a 32k context window and a 131k vocabulary Tekken tokenizer, making it suitable for various applications requiring efficient, powerful language processing.

Loading preview...

Mistral-Small-24B-Base-2501: A Powerful and Efficient Base Model

Developed by Mistral AI, Mistral-Small-24B-Base-2501 is a 24 billion parameter base model that underpins the instruction-tuned Mistral Small 3. This model is notable for its "knowledge-dense" architecture, offering state-of-the-art capabilities in the sub-70B LLM category. It is designed for efficient deployment, capable of running locally on consumer-grade hardware like an RTX 4090 or a 32GB RAM MacBook after quantization.

Key Features & Capabilities

  • Multilingual Support: Handles dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
  • Agent-Centric Design: Optimized for agentic tasks with native function calling and JSON output capabilities.
  • Advanced Reasoning: Delivers strong conversational and reasoning performance.
  • Extensive Context Window: Features a 32k token context window for processing longer inputs.
  • System Prompt Adherence: Maintains robust adherence to and support for system prompts.
  • Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.
  • Apache 2.0 License: Allows for broad commercial and non-commercial use and modification.

Benchmarks

Human evaluations show Mistral Small 3 (derived from this base) performing competitively against models like Gemma-2-27B and Qwen-2.5-32B, and holding its own against larger models like Llama-3.3-70B and GPT-4o-mini in categories such as reasoning, knowledge, math, coding, and instruction following.

Ideal Use Cases

  • Fast Response Conversational Agents: Its efficiency makes it suitable for interactive applications.
  • Low Latency Function Calling: Excellent for scenarios requiring quick tool use.
  • Subject Matter Experts: Can be fine-tuned for specialized domain knowledge.
  • Local Inference: Perfect for hobbyists and organizations handling sensitive data who require on-device processing.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p