microsoft/Phi-3.5-mini-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:Aug 16, 2024License:mitArchitecture:Transformer1.0K Open Weights Cold

microsoft/Phi-3.5-mini-instruct is a 3.8 billion parameter instruction-tuned decoder-only Transformer model developed by Microsoft, featuring a 128K token context length. Optimized for reasoning-dense data, it excels in strong reasoning tasks, particularly in code, math, and logic, and demonstrates competitive multilingual capabilities. This model is designed for commercial and research use in memory/compute constrained and latency-bound environments.

Loading preview...

Overview

Phi-3.5-mini-instruct is a 3.8 billion parameter instruction-tuned model from Microsoft's Phi-3.5 family, built upon high-quality, reasoning-dense synthetic and filtered public datasets. It supports an extensive 128K token context length and has undergone rigorous enhancement through supervised fine-tuning, proximal policy optimization, and direct preference optimization for instruction adherence and safety. This model is an update over the June 2024 Phi-3 Mini release, offering substantial gains in multilingual, multi-turn conversation quality, and reasoning capabilities.

Key Capabilities

  • Strong Reasoning: Excels in code, math, and logic tasks, achieving high scores on benchmarks like GSM8K (86.2) and MATH (48.5).
  • Multilingual Performance: Demonstrates competitive performance on multilingual MMLU (55.4 average) and other multilingual benchmarks across 20+ languages, despite its small size.
  • Long Context Understanding: Supports 128K token context, enabling tasks like long document summarization, QA, and information retrieval, outperforming some larger models in long context benchmarks like Qasper (41.9).
  • Code Generation: Achieves strong results in code generation benchmarks, with HumanEval at 62.8 and MBPP at 69.6.

Good for

  • Memory/Compute Constrained Environments: Its lightweight nature makes it suitable for deployment where resources are limited.
  • Latency-Bound Scenarios: Designed for applications requiring quick response times.
  • General Purpose AI Systems: Serves as a building block for generative AI features, particularly where strong reasoning is critical.
  • Research: Accelerates research on language models, especially for understanding performance in smaller, highly optimized models.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p