unsloth/Phi-4-mini-reasoning
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.8BQuant:BF16Ctx Length:32kPublished:May 1, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

Phi-4-mini-reasoning is a 3.8 billion parameter decoder-only Transformer model from the Phi-4 family, developed by Microsoft. Optimized for mathematical reasoning, it supports a 128K token context length and excels at multi-step, logic-intensive problem-solving. This compact model is fine-tuned with synthetic math data for high-quality reasoning in memory/compute constrained environments.

Loading preview...

Overview

Microsoft's Phi-4-mini-reasoning is a 3.8 billion parameter model from the Phi-4 family, specifically designed for advanced mathematical reasoning. It features a 128K token context length and is built upon synthetic, high-quality, reasoning-dense data, further fine-tuned for enhanced math capabilities. The model's architecture includes a 200K vocabulary, grouped-query attention, and shared input/output embedding, similar to Phi-4-Mini.

Key Capabilities

  • Multi-step Mathematical Problem-Solving: Excels at complex math problems, formal proof generation, symbolic computation, and advanced word problems.
  • Efficiency: Optimized for memory/compute constrained environments and latency-bound scenarios, making it suitable for edge or mobile deployment.
  • Knowledge Distillation: Fine-tuned using synthetic math data generated by a more capable model (Deepseek-R1), comprising over one million diverse math problems with verified solutions.
  • Performance: Achieves competitive scores on reasoning benchmarks like AIME (57.5), MATH-500 (94.6), and GPQA Diamond (52.0), often outperforming larger models in its class.

Good For

  • Mathematical Reasoning Applications: Ideal for tasks requiring deep analytical thinking and structured logic.
  • Educational Tools: Potentially suitable for embedded tutoring and other educational applications.
  • Resource-Constrained Deployments: Designed for scenarios where computing power or latency is limited.

Limitations

  • Primarily designed and tested for math reasoning; not evaluated for all downstream purposes.
  • Limited capacity for factual knowledge due to its size, which may lead to factual incorrectness (can be mitigated with RAG).
  • Performance disparities exist across non-English languages and less represented English varieties.