Overview
Microsoft Phi-2: A Small Yet Capable Language Model
Phi-2 is a 2.7 billion parameter Transformer model developed by Microsoft, building upon the data sources of Phi-1.5 and augmented with new synthetic NLP texts and filtered web data. It achieves impressive performance in benchmarks for common sense, language understanding, and logical reasoning, rivaling larger models (under 13 billion parameters).
Key Characteristics & Optimizations
- Compact Size: At 2.7 billion parameters, Phi-2 offers strong capabilities in a smaller footprint.
- Research Focus: Released as an open-source model to facilitate research into critical safety challenges like toxicity reduction, bias understanding, and controllability.
- DirectML Optimization: The
microsoft/phi-2-pytdmlversion includes specific optimizations for enhanced DirectML (DML) performance, featuring simplified implementation and operator fusions (apply_rotary_position_emb,multi_head_attention,mlp_phi2) to accelerate inference. - Training Data: Trained on 1.4 trillion tokens from a 250 billion token dataset, combining synthetic NLP data and filtered web content.
Intended Uses
Phi-2 is best suited for specific interaction formats:
- Question Answering (QA): Effective for standalone questions or structured QA prompts.
- Chat: Capable of engaging in multi-turn conversational exchanges.
- Code Generation: Excels at generating code, particularly in Python, with common packages.
Limitations
Users should be aware that Phi-2 is a base model and has not undergone instruction fine-tuning or reinforcement learning from human feedback. Consequently, it may:
- Generate inaccurate code and facts.
- Struggle with intricate or nuanced instructions.
- Exhibit societal biases and potentially produce harmful content if explicitly prompted.
- Primarily understand standard English and may be verbose due to its textbook-like training data.