Anderson-arevalo/phi-2

TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Apr 24, 2026License:mitArchitecture:Transformer Open Weights Cold

Phi-2 is a 2.7 billion parameter Transformer model developed by Microsoft, trained on a combination of synthetic NLP texts and filtered web data. It demonstrates near state-of-the-art performance among models under 13 billion parameters in common sense, language understanding, and logical reasoning benchmarks. This model is primarily intended for research into safety challenges and excels in QA, chat, and code generation formats.

Loading preview...

Model Overview

Phi-2 is a compact yet powerful Transformer model with 2.7 billion parameters, developed by Microsoft. It builds upon the data sources of its predecessor, Phi-1.5, by incorporating additional synthetic NLP texts and carefully filtered web content. This training methodology has enabled Phi-2 to achieve impressive performance, nearing state-of-the-art results among models with fewer than 13 billion parameters across benchmarks for common sense, language understanding, and logical reasoning.

Notably, Phi-2 has not been fine-tuned with reinforcement learning from human feedback (RLHF). Its primary purpose is to serve the research community as an open-source, non-restricted small model, facilitating exploration into critical safety challenges such as reducing toxicity, understanding societal biases, and enhancing controllability.

Key Capabilities & Intended Uses

  • QA Format: Optimized for question-answering tasks, supporting both standalone questions and an "Instruct: \nOutput:" format for concise answers.
  • Chat Format: Capable of engaging in multi-turn conversations, making it suitable for dialogue-based applications.
  • Code Format: Proficient in generating code, particularly in Python, for common packages. Users should verify generated code due to its limited scope.

Limitations

  • Inaccurate Code and Facts: May produce incorrect code or factual statements; outputs should be treated as starting points.
  • Limited Code Scope: Primarily trained on Python with common packages; verification is crucial for other languages or less common packages.
  • Unreliable Instruction Adherence: As a base model without instruction fine-tuning, it may struggle with complex instructions.
  • Language Limitations: Primarily understands standard English; informal language or other languages may lead to misinterpretations.
  • Potential Societal Biases & Toxicity: Despite data filtering, it may reflect societal biases or generate harmful content if explicitly prompted.
  • Verbosity: Can produce irrelevant or extra text due to its textbook-like training data.

Training Details

Phi-2 was trained on 1.4 trillion tokens over 14 days using 96xA100-80G GPUs. The dataset comprised 250 billion tokens, combining synthetic NLP data generated by AOAI GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, assessed by AOAI GPT-4. The model has a context length of 2048 tokens.