StanfordAIMI/GREEN-Phi2 Overview
StanfordAIMI/GREEN-Phi2 is a 3 billion parameter language model, derived from the Microsoft Phi-2 architecture. This version has been further fine-tuned on an undisclosed dataset, demonstrating a final validation loss of 0.0781 during its training process. The model maintains a context length of 2048 tokens.
Training Details
The fine-tuning process involved a learning rate of 0.0001, a total batch size of 2048 (achieved with a train_batch_size of 8 and gradient_accumulation_steps of 32), and ran for 12 epochs. The optimizer used was Adam with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.05 warmup ratio was applied. Training was conducted across 8 GPUs.
Key Characteristics
- Base Model: Microsoft Phi-2, known for its compact size and strong performance.
- Parameter Count: 3 billion parameters.
- Context Length: 2048 tokens.
- Training Objective: Achieved a low validation loss of 0.0781, indicating effective learning on its fine-tuning dataset.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided information, as a fine-tuned version of Phi-2, this model is generally suitable for tasks requiring efficient language understanding and generation, particularly where resource constraints are a consideration due to its relatively smaller size compared to larger LLMs.