microsoft/Phi-3.5-MoE-instruct
microsoft/Phi-3.5-MoE-instruct is a 41.9 billion parameter Mixture-of-Experts (MoE) decoder-only Transformer model developed by Microsoft, featuring 6.6 billion active parameters and a 128K token context length. Trained on 4.9 trillion tokens with a focus on high-quality, reasoning-dense data, it excels in strong reasoning tasks, particularly in code, math, and logic. This model is optimized for memory/compute-constrained environments and latency-bound scenarios, offering competitive performance against larger models in multilingual and long-context applications.
Loading preview...
Overview
Microsoft's Phi-3.5-MoE-instruct is a 41.9 billion parameter Mixture-of-Experts (MoE) model, utilizing 6.6 billion active parameters for efficient performance. It is built upon high-quality, reasoning-dense datasets, including synthetic data and filtered public documents, and supports a substantial 128K token context length. The model has undergone extensive fine-tuning, including supervised fine-tuning, proximal policy optimization, and direct preference optimization, to ensure precise instruction adherence and robust safety.
Key Capabilities
- Strong Reasoning: Excels in tasks requiring code, math, and logic, often outperforming larger models in these categories.
- Multilingual Support: Demonstrates competitive performance across various multilingual benchmarks, including Multilingual MMLU and MGSM.
- Extended Context Understanding: Supports 128K context length, making it suitable for long document summarization, QA, and multilingual context retrieval.
- Efficiency: Designed for memory/compute-constrained environments and latency-bound scenarios due to its optimized active parameter count.
Good for
- General-purpose AI systems requiring strong reasoning capabilities.
- Applications in memory or compute-constrained environments where efficiency is critical.
- Latency-sensitive scenarios benefiting from its optimized architecture.
- Research and commercial use in multiple languages, particularly for tasks involving long documents or complex logical problems.