microsoft/phi-2

Warm
Public
3B
BF16
2048
Dec 13, 2023
License: mit
Hugging Face
Overview

Model Overview

Microsoft's Phi-2 is a 2.7 billion parameter Transformer model, building upon the data sources of its predecessor, Phi-1.5, augmented with new synthetic NLP texts and filtered web content. It was trained for 14 days on 96xA100-80G GPUs, processing 1.4 trillion tokens from a 250 billion token dataset. Phi-2 achieves near state-of-the-art performance in benchmarks for common sense, language understanding, and logical reasoning among models smaller than 13 billion parameters.

Key Capabilities

  • Versatile Interaction Formats: Best suited for prompts in QA, chat, and code formats.
  • Research Focus: Released as a non-restricted small model to facilitate research into critical safety challenges like toxicity reduction, societal bias understanding, and enhancing controllability.
  • Code Generation: Capable of generating Python code, particularly with common packages like typing, math, random, collections, datetime, and itertools.

Limitations and Considerations

  • Base Model: Phi-2 is a base model and has not undergone instruction fine-tuning or reinforcement learning from human feedback, which may lead to unreliable responses to complex instructions and verbosity.
  • Accuracy: May generate inaccurate code and facts; outputs should be treated as starting points, not definitive solutions.
  • Language: Primarily designed for standard English; informal English or other languages may pose comprehension challenges.
  • Societal Biases & Toxicity: Despite data filtering, the model may still reflect societal biases or produce harmful content if explicitly prompted.
  • Attention Overflow: Users may encounter attention overflow issues with FP16, requiring specific handling in PhiAttention.forward().