microsoft/Phi-4-mini-instruct
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.8BQuant:BF16Ctx Length:32kPublished:Feb 19, 2025License:mitArchitecture:Transformer0.7K Open Weights Warm

microsoft/Phi-4-mini-instruct is a 3.8 billion parameter instruction-tuned decoder-only Transformer model from Microsoft, featuring a 128K token context length. Built on synthetic data and filtered public websites, it focuses on high-quality, reasoning-dense data. This model is optimized for memory/compute-constrained environments and latency-bound scenarios, excelling particularly in strong reasoning tasks like math and logic.

Loading preview...

Model Overview

Phi-4-mini-instruct is a 3.8 billion parameter instruction-tuned model developed by Microsoft, part of the Phi-4 family. It features a 128K token context length and incorporates a new architecture for efficiency, a larger vocabulary for multilingual support, and enhanced post-training techniques for instruction following and function calling. The model was trained on 5 trillion tokens, including synthetic "textbook-like" data focused on math, coding, and common sense reasoning, alongside filtered high-quality public documents.

Key Capabilities

  • Strong Reasoning: Excels in mathematical and logical reasoning tasks, outperforming similar-sized models on benchmarks like GSM8K (88.6%) and MATH (64.0%).
  • Extended Context: Supports a substantial 128K token context length, enabling processing of longer inputs and maintaining conversational coherence.
  • Multilingual Support: Features a 200K vocabulary and improved multilingual capabilities, supporting languages such as Arabic, Chinese, French, German, Japanese, and Spanish.
  • Instruction Adherence & Safety: Enhanced through supervised fine-tuning and direct preference optimization for precise instruction following and robust safety measures.
  • Function Calling: Designed to support tool-enabled function calling, allowing integration with external tools.

Good for

  • Memory/Compute Constrained Environments: Its lightweight design makes it suitable for deployment where resources are limited.
  • Latency-Bound Scenarios: Optimized for applications requiring quick response times.
  • General Purpose AI Systems: A versatile building block for various generative AI features and applications.
  • Research Acceleration: Intended to accelerate research in language and multimodal models.

While performing well for its size, the model's capacity for factual knowledge is limited, suggesting augmentation with search engines for RAG settings to mitigate potential factual inaccuracies.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p