sathiiiii/polyalign-gemma2-2b-en-sft
The sathiiiii/polyalign-gemma2-2b-en-sft model is a 2.6 billion parameter language model fine-tuned from Google's Gemma-2-2b architecture. It was fine-tuned on the polyalign_train dataset, achieving a final validation loss of 1.3660. This model is designed for general language understanding and generation tasks, leveraging the capabilities of the Gemma-2 base model.
Loading preview...
Model Overview
The sathiiiii/polyalign-gemma2-2b-en-sft model is a fine-tuned version of the Google Gemma-2-2b base model, featuring approximately 2.6 billion parameters. It has been specifically trained on the polyalign_train dataset, resulting in a validation loss of 1.3660.
Key Training Details
This model underwent a single epoch of training using the following notable hyperparameters:
- Learning Rate: 1e-05
- Batch Size: A total effective batch size of 64 (2 per device across 8 GPUs with 4 gradient accumulation steps).
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Mixed Precision: Native AMP was utilized for training efficiency.
Performance
During training, the model's validation loss progressively decreased, reaching a final value of 1.3660 after 9000 steps.
Intended Uses
While specific intended uses and limitations are not detailed in the provided information, as a fine-tuned Gemma-2-2b model, it is generally suitable for a range of natural language processing tasks, including text generation, summarization, and question answering, particularly those aligned with the characteristics of its fine-tuning dataset.