unsloth/phi-4-reasoning

TEXT GENERATIONConcurrency Cost:1Model Size:14.7BQuant:FP8Ctx Length:32kPublished:May 1, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

Phi-4-reasoning is a 14.7 billion parameter dense decoder-only Transformer model developed by Microsoft Research, fine-tuned for advanced reasoning tasks. It excels in math, science, and coding, leveraging supervised fine-tuning on chain-of-thought traces and reinforcement learning. With a 32k token context length, it is optimized for memory/compute-constrained and latency-bound environments requiring strong logical capabilities. The model outputs generated text with a distinct reasoning chain-of-thought block followed by a summarization block.

Loading preview...

Overview

Phi-4-reasoning is a 14.7 billion parameter model from Microsoft Research, specifically fine-tuned for advanced reasoning. It builds upon the Phi-4 architecture, utilizing supervised fine-tuning on a dataset rich in chain-of-thought traces and reinforcement learning. This training focuses on high-quality data for math, science, and coding skills, alongside alignment data for safety.

Key Capabilities

  • Advanced Reasoning: Designed to excel in complex reasoning tasks across math, science, and coding.
  • Chain-of-Thought Output: Generates responses with a detailed reasoning process (<think> block) followed by a concise solution (<solution> block).
  • Optimized Performance: Suitable for memory/compute-constrained and latency-bound environments due to its efficient design.
  • Extensive Context: Supports a 32k token context length, allowing for longer and more complex queries.
  • Strong Benchmarks: Demonstrates competitive performance on reasoning benchmarks like AIME, OmniMath, GPQA-Diamond, and LiveCodeBench, often outperforming larger models in its class.

Good For

  • Accelerating research in language models, serving as a building block for generative AI features.
  • General-purpose AI systems and applications (primarily in English) that require strong reasoning and logic.
  • Use cases where memory/compute resources are limited or low latency is critical.
  • Tasks involving complex problem-solving in mathematics, scientific inquiry, and code generation.