unsloth/phi-4-reasoning
Phi-4-reasoning is a 14.7 billion parameter dense decoder-only Transformer model developed by Microsoft Research, fine-tuned for advanced reasoning tasks. It excels in math, science, and coding, leveraging supervised fine-tuning on chain-of-thought traces and reinforcement learning. With a 32k token context length, it is optimized for memory/compute-constrained and latency-bound environments requiring strong logical capabilities. The model outputs generated text with a distinct reasoning chain-of-thought block followed by a summarization block.
Loading preview...
Overview
Phi-4-reasoning is a 14.7 billion parameter model from Microsoft Research, specifically fine-tuned for advanced reasoning. It builds upon the Phi-4 architecture, utilizing supervised fine-tuning on a dataset rich in chain-of-thought traces and reinforcement learning. This training focuses on high-quality data for math, science, and coding skills, alongside alignment data for safety.
Key Capabilities
- Advanced Reasoning: Designed to excel in complex reasoning tasks across math, science, and coding.
- Chain-of-Thought Output: Generates responses with a detailed reasoning process (
<think>block) followed by a concise solution (<solution>block). - Optimized Performance: Suitable for memory/compute-constrained and latency-bound environments due to its efficient design.
- Extensive Context: Supports a 32k token context length, allowing for longer and more complex queries.
- Strong Benchmarks: Demonstrates competitive performance on reasoning benchmarks like AIME, OmniMath, GPQA-Diamond, and LiveCodeBench, often outperforming larger models in its class.
Good For
- Accelerating research in language models, serving as a building block for generative AI features.
- General-purpose AI systems and applications (primarily in English) that require strong reasoning and logic.
- Use cases where memory/compute resources are limited or low latency is critical.
- Tasks involving complex problem-solving in mathematics, scientific inquiry, and code generation.