Sirawipa/tian-ft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.6BQuant:BF16Ctx Length:32kPublished:May 23, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

Sirawipa/tian-ft is a 0.6 billion parameter language model, fine-tuned from the sail/Sailor-0.5B architecture. This model was trained for 10 epochs with a learning rate of 0.0002 and achieved a final validation loss of 0.3696. Its specific intended uses and primary differentiators are not detailed in the available documentation.

Loading preview...

Model Overview

Sirawipa/tian-ft is a 0.6 billion parameter language model, fine-tuned from the sail/Sailor-0.5B base model. The fine-tuning process involved 10 epochs, utilizing a linear learning rate scheduler with a peak learning rate of 0.0002. The model achieved a validation loss of 0.3696 on its evaluation set.

Training Details

Training was conducted with a batch size of 4, accumulating gradients over 4 steps for an effective total batch size of 16. The Adam optimizer was used with standard betas and epsilon. Mixed-precision training (Native AMP) was enabled to optimize performance. The training process showed a consistent decrease in training loss, with validation loss stabilizing towards the end of the 10 epochs.

Limitations

The available documentation does not specify the dataset used for fine-tuning, nor does it detail the model's intended uses or specific limitations. Therefore, its optimal applications and unique capabilities compared to other models are currently undefined.