Overview
Model Overview
Microsoft's Phi-2 is a 2.7 billion parameter Transformer model, building upon the data sources of its predecessor, Phi-1.5, augmented with new synthetic NLP texts and filtered web content. It was trained for 14 days on 96xA100-80G GPUs, processing 1.4 trillion tokens from a 250 billion token dataset. Phi-2 achieves near state-of-the-art performance in benchmarks for common sense, language understanding, and logical reasoning among models smaller than 13 billion parameters.
Key Capabilities
- Versatile Interaction Formats: Best suited for prompts in QA, chat, and code formats.
- Research Focus: Released as a non-restricted small model to facilitate research into critical safety challenges like toxicity reduction, societal bias understanding, and enhancing controllability.
- Code Generation: Capable of generating Python code, particularly with common packages like
typing,math,random,collections,datetime, anditertools.
Limitations and Considerations
- Base Model: Phi-2 is a base model and has not undergone instruction fine-tuning or reinforcement learning from human feedback, which may lead to unreliable responses to complex instructions and verbosity.
- Accuracy: May generate inaccurate code and facts; outputs should be treated as starting points, not definitive solutions.
- Language: Primarily designed for standard English; informal English or other languages may pose comprehension challenges.
- Societal Biases & Toxicity: Despite data filtering, the model may still reflect societal biases or produce harmful content if explicitly prompted.
- Attention Overflow: Users may encounter attention overflow issues with FP16, requiring specific handling in
PhiAttention.forward().