microsoft/phi-2-pytdml

Warm
Public
3B
BF16
2048
License: mit
Hugging Face
Overview

Microsoft Phi-2: A Small Yet Capable Language Model

Phi-2 is a 2.7 billion parameter Transformer model developed by Microsoft, building upon the data sources of Phi-1.5 and augmented with new synthetic NLP texts and filtered web data. It achieves impressive performance in benchmarks for common sense, language understanding, and logical reasoning, rivaling larger models (under 13 billion parameters).

Key Characteristics & Optimizations

  • Compact Size: At 2.7 billion parameters, Phi-2 offers strong capabilities in a smaller footprint.
  • Research Focus: Released as an open-source model to facilitate research into critical safety challenges like toxicity reduction, bias understanding, and controllability.
  • DirectML Optimization: The microsoft/phi-2-pytdml version includes specific optimizations for enhanced DirectML (DML) performance, featuring simplified implementation and operator fusions (apply_rotary_position_emb, multi_head_attention, mlp_phi2) to accelerate inference.
  • Training Data: Trained on 1.4 trillion tokens from a 250 billion token dataset, combining synthetic NLP data and filtered web content.

Intended Uses

Phi-2 is best suited for specific interaction formats:

  • Question Answering (QA): Effective for standalone questions or structured QA prompts.
  • Chat: Capable of engaging in multi-turn conversational exchanges.
  • Code Generation: Excels at generating code, particularly in Python, with common packages.

Limitations

Users should be aware that Phi-2 is a base model and has not undergone instruction fine-tuning or reinforcement learning from human feedback. Consequently, it may:

  • Generate inaccurate code and facts.
  • Struggle with intricate or nuanced instructions.
  • Exhibit societal biases and potentially produce harmful content if explicitly prompted.
  • Primarily understand standard English and may be verbose due to its textbook-like training data.