Sirawipa/tian-ft
Sirawipa/tian-ft is a 0.6 billion parameter language model, fine-tuned from the sail/Sailor-0.5B architecture. This model was trained for 10 epochs with a learning rate of 0.0002 and achieved a final validation loss of 0.3696. Its specific intended uses and primary differentiators are not detailed in the available documentation.
Loading preview...
Model Overview
Sirawipa/tian-ft is a 0.6 billion parameter language model, fine-tuned from the sail/Sailor-0.5B base model. The fine-tuning process involved 10 epochs, utilizing a linear learning rate scheduler with a peak learning rate of 0.0002. The model achieved a validation loss of 0.3696 on its evaluation set.
Training Details
Training was conducted with a batch size of 4, accumulating gradients over 4 steps for an effective total batch size of 16. The Adam optimizer was used with standard betas and epsilon. Mixed-precision training (Native AMP) was enabled to optimize performance. The training process showed a consistent decrease in training loss, with validation loss stabilizing towards the end of the 10 epochs.
Limitations
The available documentation does not specify the dataset used for fine-tuning, nor does it detail the model's intended uses or specific limitations. Therefore, its optimal applications and unique capabilities compared to other models are currently undefined.