Model Overview
The Microsoft Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, instruction-tuned model designed for efficiency and strong reasoning. It is part of the Phi-3 family and stands out with its 128K token context length, enabling it to process and understand significantly longer inputs compared to many other models in its size class. The model was trained on a high-quality dataset combining synthetic data and filtered public web data, with a focus on reasoning-dense properties.
Key Capabilities
- Extended Context Understanding: Supports a 128K token context window, excelling in long document summarization and long-context QA tasks.
- Enhanced Reasoning: Demonstrates robust performance across common sense, language understanding, mathematics, coding, and logical reasoning benchmarks.
- Instruction Following: Improved instruction following and structured output generation through supervised fine-tuning and direct preference optimization.
- Efficiency: Optimized for memory/compute-constrained and latency-bound environments.
Performance Highlights
Recent updates in June 2024 show substantial gains, particularly in long-context understanding and code reasoning. For instance, on the RULER benchmark for long context understanding, the updated model achieved an 84.6% average, up from 68.8%. In code understanding (RepoQA), its average score jumped from 32.4% to 77%. It also shows competitive performance on benchmarks like MMLU (69.7%) and HumanEval (60.4%) compared to larger models.
Good For
- Applications requiring strong reasoning in code, math, and logic.
- Deployments in memory/compute-constrained environments.
- Scenarios where low latency is critical.
- Tasks involving long document processing and understanding.