Magistral-Small-2506 Overview
Magistral-Small-2506 is a 24 billion parameter language model from unsloth, based on Mistral Small 3.1. It has been enhanced with added reasoning capabilities through Supervised Fine-Tuning (SFT) using Magistral Medium traces and subsequent Reinforcement Learning (RL). This model is optimized for complex reasoning tasks, capable of generating long chains of thought before providing an answer.
Key Features
- Enhanced Reasoning: Specifically designed to perform detailed reasoning traces.
- Multilingual Support: Supports a wide array of languages including English, French, German, Japanese, Chinese, and many others.
- Flexible Deployment: Can be deployed locally, fitting on an RTX 4090 or a 32GB RAM MacBook after quantization.
- Apache 2.0 License: Allows for broad commercial and non-commercial use and modification.
- Extended Context Window: Features a 128k context window, with optimal performance recommended up to 40k tokens.
Performance Highlights
While Magistral Medium shows slightly higher scores, Magistral Small demonstrates strong performance across various benchmarks, including AIME24, AIME25, GPQA Diamond, and Livecodebench (v5).
Recommended Usage
For optimal results, users should employ specific sampling parameters: top_p: 0.95, temperature: 0.7, and max_tokens: 40960. The model also benefits from a recommended chat template that includes a system prompt for structured thinking processes, encouraging an inner monologue before generating a concise summary. Inference is recommended via vLLM, with community-prepared quantized versions available for llama.cpp, lmstudio, ollama, and unsloth.