microsoft/phi-2 is a 2.7 billion parameter Transformer-based causal language model developed by Microsoft. Trained on a mix of synthetic NLP texts and filtered web data, it demonstrates near state-of-the-art performance among models under 13 billion parameters in common sense, language understanding, and logical reasoning benchmarks. This model is primarily intended for research into safety challenges and excels in QA, chat, and code generation formats.
Loading preview...
Model Overview
Microsoft's Phi-2 is a 2.7 billion parameter Transformer model, building upon the data sources of its predecessor, Phi-1.5, augmented with new synthetic NLP texts and filtered web content. It was trained for 14 days on 96xA100-80G GPUs, processing 1.4 trillion tokens from a 250 billion token dataset. Phi-2 achieves near state-of-the-art performance in benchmarks for common sense, language understanding, and logical reasoning among models smaller than 13 billion parameters.
Key Capabilities
- Versatile Interaction Formats: Best suited for prompts in QA, chat, and code formats.
- Research Focus: Released as a non-restricted small model to facilitate research into critical safety challenges like toxicity reduction, societal bias understanding, and enhancing controllability.
- Code Generation: Capable of generating Python code, particularly with common packages like
typing,math,random,collections,datetime, anditertools.
Limitations and Considerations
- Base Model: Phi-2 is a base model and has not undergone instruction fine-tuning or reinforcement learning from human feedback, which may lead to unreliable responses to complex instructions and verbosity.
- Accuracy: May generate inaccurate code and facts; outputs should be treated as starting points, not definitive solutions.
- Language: Primarily designed for standard English; informal English or other languages may pose comprehension challenges.
- Societal Biases & Toxicity: Despite data filtering, the model may still reflect societal biases or produce harmful content if explicitly prompted.
- Attention Overflow: Users may encounter attention overflow issues with FP16, requiring specific handling in
PhiAttention.forward().
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.