microsoft/Phi-3.5-MoE-instruct

TEXT GENERATIONConcurrency Cost:3Model Size:41.9BQuant:FP8Ctx Length:32kPublished:Aug 17, 2024License:mitArchitecture:Transformer0.6K Open Weights Cold

microsoft/Phi-3.5-MoE-instruct is a 41.9 billion parameter Mixture-of-Experts (MoE) decoder-only Transformer model developed by Microsoft, featuring 6.6 billion active parameters and a 128K token context length. Trained on 4.9 trillion tokens with a focus on high-quality, reasoning-dense data, it excels in strong reasoning tasks, particularly in code, math, and logic. This model is optimized for memory/compute-constrained environments and latency-bound scenarios, offering competitive performance against larger models in multilingual and long-context applications.

Loading preview...

Overview

Microsoft's Phi-3.5-MoE-instruct is a 41.9 billion parameter Mixture-of-Experts (MoE) model, utilizing 6.6 billion active parameters for efficient performance. It is built upon high-quality, reasoning-dense datasets, including synthetic data and filtered public documents, and supports a substantial 128K token context length. The model has undergone extensive fine-tuning, including supervised fine-tuning, proximal policy optimization, and direct preference optimization, to ensure precise instruction adherence and robust safety.

Key Capabilities

  • Strong Reasoning: Excels in tasks requiring code, math, and logic, often outperforming larger models in these categories.
  • Multilingual Support: Demonstrates competitive performance across various multilingual benchmarks, including Multilingual MMLU and MGSM.
  • Extended Context Understanding: Supports 128K context length, making it suitable for long document summarization, QA, and multilingual context retrieval.
  • Efficiency: Designed for memory/compute-constrained environments and latency-bound scenarios due to its optimized active parameter count.

Good for

  • General-purpose AI systems requiring strong reasoning capabilities.
  • Applications in memory or compute-constrained environments where efficiency is critical.
  • Latency-sensitive scenarios benefiting from its optimized architecture.
  • Research and commercial use in multiple languages, particularly for tasks involving long documents or complex logical problems.