microsoft/phi-4

TEXT GENERATIONConcurrency Cost:1Model Size:14.7BQuant:FP8Ctx Length:32kPublished:Dec 11, 2024License:mitArchitecture:Transformer2.2K Open Weights Cold

microsoft/phi-4 is a 14.7 billion parameter dense decoder-only Transformer model developed by Microsoft Research. Trained on 9.8 trillion tokens, it is designed to accelerate language model research and serves as a building block for generative AI features. This model excels in reasoning and logic tasks, making it suitable for memory/compute constrained and latency-bound environments.

Loading preview...

Model Overview

phi-4 is a 14.7 billion parameter dense decoder-only Transformer model developed by Microsoft Research. It was trained on a diverse dataset of 9.8 trillion tokens, including synthetic data, filtered public domain websites, and acquired academic books and Q&A datasets. The model underwent rigorous enhancement and alignment using supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. It supports a context length of 16K tokens and is best suited for prompts in a chat format.

Key Capabilities

  • Advanced Reasoning: Designed with a focus on high-quality data to enhance advanced reasoning and logic capabilities.
  • Instruction Adherence: Rigorously fine-tuned for precise instruction following.
  • Safety & Alignment: Incorporates robust safety post-training with SFT and iterative DPO, evaluated through quantitative benchmarks and red-teaming.
  • Performance: Demonstrates strong performance across various benchmarks, including MMLU (84.8), GPQA (56.1), MATH (80.4), and HumanEval (82.6), often outperforming similarly sized models.

Intended Use Cases

phi-4 is primarily designed to accelerate research on language models and serve as a foundational component for generative AI applications. It is particularly well-suited for:

  • Resource-Constrained Environments: Ideal for scenarios with limited memory or computational resources.
  • Latency-Sensitive Applications: Performs efficiently in use cases requiring low latency.
  • Reasoning and Logic Tasks: Excels in applications demanding strong reasoning and logical capabilities.

Limitations

  • Primarily trained on English text, leading to reduced performance in other languages.
  • May exhibit biases or generate inappropriate content despite safety measures.
  • Can produce inaccurate or outdated information, requiring developers to implement safeguards like Retrieval Augmented Generation (RAG) for critical applications.