csalab/Magistral-24B

TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Jun 27, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The csalab/Magistral-24B is a 24 billion parameter language model developed by csalab, built upon Mistral Small 3.1 with enhanced reasoning capabilities. It is optimized for complex reasoning tasks, capable of generating long chains of thought before providing an answer. This multilingual model supports dozens of languages and features an Apache 2.0 license, making it suitable for a wide range of commercial and non-commercial applications requiring advanced reasoning and broad language support.

Loading preview...

Magistral-24B: An Enhanced Reasoning Model

The csalab/Magistral-24B is a 24 billion parameter language model, an evolution of Mistral Small 3.1, specifically enhanced for advanced reasoning. It incorporates Supervised Fine-Tuning (SFT) from Magistral Medium traces and Reinforcement Learning (RL) to improve its ability to generate detailed thought processes before arriving at a final answer.

Key Capabilities

  • Advanced Reasoning: Excels at tasks requiring long chains of reasoning, drafting internal thoughts (<think>) before producing a concise summary.
  • Multilingual Support: Capable of processing and generating text in dozens of languages, including English, French, German, Japanese, Korean, Chinese, and Arabic.
  • Deployment Flexibility: Designed to be efficient enough for local deployment, fitting on a single RTX 4090 or a 32GB RAM MacBook when quantized.
  • Open Licensing: Released under the Apache 2.0 license, permitting broad commercial and non-commercial use and modification.
  • Context Window: Features a 128k context window, with optimal performance recommended up to 40k tokens.

Performance Highlights

Magistral-24B demonstrates strong performance in reasoning benchmarks, achieving 70.68% on AIME24 pass@1 and 68.18% on GPQA Diamond, positioning it as a capable model for complex problem-solving. It is recommended to use specific sampling parameters (top_p: 0.95, temperature: 0.7, max_tokens: 40960) and a structured chat template for best results, especially leveraging the system prompt for reasoning traces.