microsoft/phi-4
microsoft/phi-4 is a 14.7 billion parameter dense decoder-only Transformer model developed by Microsoft Research. Trained on 9.8 trillion tokens, it is designed to accelerate language model research and serves as a building block for generative AI features. This model excels in reasoning and logic tasks, making it suitable for memory/compute constrained and latency-bound environments.
Loading preview...
Model Overview
phi-4 is a 14.7 billion parameter dense decoder-only Transformer model developed by Microsoft Research. It was trained on a diverse dataset of 9.8 trillion tokens, including synthetic data, filtered public domain websites, and acquired academic books and Q&A datasets. The model underwent rigorous enhancement and alignment using supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. It supports a context length of 16K tokens and is best suited for prompts in a chat format.
Key Capabilities
- Advanced Reasoning: Designed with a focus on high-quality data to enhance advanced reasoning and logic capabilities.
- Instruction Adherence: Rigorously fine-tuned for precise instruction following.
- Safety & Alignment: Incorporates robust safety post-training with SFT and iterative DPO, evaluated through quantitative benchmarks and red-teaming.
- Performance: Demonstrates strong performance across various benchmarks, including MMLU (84.8), GPQA (56.1), MATH (80.4), and HumanEval (82.6), often outperforming similarly sized models.
Intended Use Cases
phi-4 is primarily designed to accelerate research on language models and serve as a foundational component for generative AI applications. It is particularly well-suited for:
- Resource-Constrained Environments: Ideal for scenarios with limited memory or computational resources.
- Latency-Sensitive Applications: Performs efficiently in use cases requiring low latency.
- Reasoning and Logic Tasks: Excels in applications demanding strong reasoning and logical capabilities.
Limitations
- Primarily trained on English text, leading to reduced performance in other languages.
- May exhibit biases or generate inappropriate content despite safety measures.
- Can produce inaccurate or outdated information, requiring developers to implement safeguards like Retrieval Augmented Generation (RAG) for critical applications.