Model Overview

Phi-2 is a compact yet powerful Transformer model with 2.7 billion parameters, developed by Microsoft. It builds upon the data sources of its predecessor, Phi-1.5, by incorporating additional synthetic NLP texts and carefully filtered web content. This training methodology has enabled Phi-2 to achieve impressive performance, nearing state-of-the-art results among models with fewer than 13 billion parameters across benchmarks for common sense, language understanding, and logical reasoning.

Notably, Phi-2 has not been fine-tuned with reinforcement learning from human feedback (RLHF). Its primary purpose is to serve the research community as an open-source, non-restricted small model, facilitating exploration into critical safety challenges such as reducing toxicity, understanding societal biases, and enhancing controllability.

Key Capabilities & Intended Uses

QA Format: Optimized for question-answering tasks, supporting both standalone questions and an "Instruct: \nOutput:" format for concise answers.
Chat Format: Capable of engaging in multi-turn conversations, making it suitable for dialogue-based applications.
Code Format: Proficient in generating code, particularly in Python, for common packages. Users should verify generated code due to its limited scope.

Limitations

Inaccurate Code and Facts: May produce incorrect code or factual statements; outputs should be treated as starting points.
Limited Code Scope: Primarily trained on Python with common packages; verification is crucial for other languages or less common packages.
Unreliable Instruction Adherence: As a base model without instruction fine-tuning, it may struggle with complex instructions.
Language Limitations: Primarily understands standard English; informal language or other languages may lead to misinterpretations.
Potential Societal Biases & Toxicity: Despite data filtering, it may reflect societal biases or generate harmful content if explicitly prompted.
Verbosity: Can produce irrelevant or extra text due to its textbook-like training data.

Training Details

Phi-2 was trained on 1.4 trillion tokens over 14 days using 96xA100-80G GPUs. The dataset comprised 250 billion tokens, combining synthetic NLP data generated by AOAI GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, assessed by AOAI GPT-4. The model has a context length of 2048 tokens.

Overview

Model Overview

Key Capabilities & Intended Uses

Limitations

Training Details

Full Model Card (README)