sathiiiii/polyalign-gemma2-2b-en-sft

TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Apr 20, 2026License:otherArchitecture:Transformer Cold

The sathiiiii/polyalign-gemma2-2b-en-sft model is a 2.6 billion parameter language model fine-tuned from Google's Gemma-2-2b architecture. It was fine-tuned on the polyalign_train dataset, achieving a final validation loss of 1.3660. This model is designed for general language understanding and generation tasks, leveraging the capabilities of the Gemma-2 base model.

Loading preview...

Model Overview

The sathiiiii/polyalign-gemma2-2b-en-sft model is a fine-tuned version of the Google Gemma-2-2b base model, featuring approximately 2.6 billion parameters. It has been specifically trained on the polyalign_train dataset, resulting in a validation loss of 1.3660.

Key Training Details

This model underwent a single epoch of training using the following notable hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total effective batch size of 64 (2 per device across 8 GPUs with 4 gradient accumulation steps).
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Mixed Precision: Native AMP was utilized for training efficiency.

Performance

During training, the model's validation loss progressively decreased, reaching a final value of 1.3660 after 9000 steps.

Intended Uses

While specific intended uses and limitations are not detailed in the provided information, as a fine-tuned Gemma-2-2b model, it is generally suitable for a range of natural language processing tasks, including text generation, summarization, and question answering, particularly those aligned with the characteristics of its fine-tuning dataset.