StanfordAIMI/GREEN-Phi2

Warm
Public
3B
BF16
2048
License: mit
Hugging Face
Overview

StanfordAIMI/GREEN-Phi2 Overview

StanfordAIMI/GREEN-Phi2 is a 3 billion parameter language model, derived from the Microsoft Phi-2 architecture. This version has been further fine-tuned on an undisclosed dataset, demonstrating a final validation loss of 0.0781 during its training process. The model maintains a context length of 2048 tokens.

Training Details

The fine-tuning process involved a learning rate of 0.0001, a total batch size of 2048 (achieved with a train_batch_size of 8 and gradient_accumulation_steps of 32), and ran for 12 epochs. The optimizer used was Adam with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.05 warmup ratio was applied. Training was conducted across 8 GPUs.

Key Characteristics

  • Base Model: Microsoft Phi-2, known for its compact size and strong performance.
  • Parameter Count: 3 billion parameters.
  • Context Length: 2048 tokens.
  • Training Objective: Achieved a low validation loss of 0.0781, indicating effective learning on its fine-tuning dataset.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided information, as a fine-tuned version of Phi-2, this model is generally suitable for tasks requiring efficient language understanding and generation, particularly where resource constraints are a consideration due to its relatively smaller size compared to larger LLMs.