StanfordAIMI/GREEN-RadPhi2
TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Mar 25, 2024Architecture:Transformer0.0K Cold

StanfordAIMI/GREEN-RadPhi2 is a 3 billion parameter language model, fine-tuned from StanfordAIMI/RadPhi-2. This model demonstrates a low validation loss of 0.0816, indicating strong performance on its training objective. It is suitable for tasks requiring a compact yet capable language model, building upon its RadPhi-2 base.

Loading preview...

Model Overview

StanfordAIMI/GREEN-RadPhi2 is a 3 billion parameter language model, fine-tuned from the existing StanfordAIMI/RadPhi-2 architecture. The model was trained over 12 epochs, achieving a final validation loss of 0.0816.

Training Details

The training process utilized specific hyperparameters to optimize performance:

  • Learning Rate: 0.0001
  • Batch Size: 8 (train and eval), with a total effective train batch size of 2048 due to gradient accumulation
  • Optimizer: Adam with betas=(0.9, 0.95) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a 0.05 warmup ratio

Performance

During training, the model consistently reduced its validation loss, reaching a minimum of 0.0786 at epoch 8.95 and concluding with 0.0816. This indicates a well-converged fine-tuning process.

Limitations

The specific dataset used for fine-tuning is not detailed in the model card, and further information regarding intended uses and limitations is currently unavailable. Users should exercise caution and conduct thorough evaluations for specific applications.