StanfordAIMI/GREEN-Phi2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Mar 28, 2024License:mitArchitecture:Transformer Open Weights Warm

StanfordAIMI/GREEN-Phi2 is a 3 billion parameter causal language model, fine-tuned from Microsoft's Phi-2 architecture with a 2048-token context length. This model has undergone further training on an unspecified dataset, achieving a final validation loss of 0.0781. It is intended for general language generation tasks, building upon the compact yet capable design of the original Phi-2.

Loading preview...

StanfordAIMI/GREEN-Phi2 Overview

StanfordAIMI/GREEN-Phi2 is a 3 billion parameter language model, derived from the Microsoft Phi-2 architecture. This version has been further fine-tuned on an undisclosed dataset, demonstrating a final validation loss of 0.0781 during its training process. The model maintains a context length of 2048 tokens.

Training Details

The fine-tuning process involved a learning rate of 0.0001, a total batch size of 2048 (achieved with a train_batch_size of 8 and gradient_accumulation_steps of 32), and ran for 12 epochs. The optimizer used was Adam with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.05 warmup ratio was applied. Training was conducted across 8 GPUs.

Key Characteristics

  • Base Model: Microsoft Phi-2, known for its compact size and strong performance.
  • Parameter Count: 3 billion parameters.
  • Context Length: 2048 tokens.
  • Training Objective: Achieved a low validation loss of 0.0781, indicating effective learning on its fine-tuning dataset.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided information, as a fine-tuned version of Phi-2, this model is generally suitable for tasks requiring efficient language understanding and generation, particularly where resource constraints are a consideration due to its relatively smaller size compared to larger LLMs.