davron04/gemma-3-270m-uzen-base

TEXT GENERATIONConcurrency Cost:1Model Size:0.3BQuant:BF16Ctx Length:32kPublished:Jun 6, 2026Architecture:Transformer Cold

davron04/gemma-3-270m-uzen-base is a 0.3 billion parameter Gemma-3 base model, fine-tuned by davron04. This model demonstrates a perplexity of 9.0416 on its evaluation set, indicating its language modeling capabilities. It is suitable for tasks requiring a compact yet capable language model, particularly for further fine-tuning on specific datasets.

Loading preview...

Model Overview

davron04/gemma-3-270m-uzen-base is a fine-tuned variant of the Gemma-3 270M base model, developed by davron04. This model has undergone further training on an unspecified dataset, resulting in a reported loss of 2.1987 and a perplexity of 9.0416 on its evaluation set. The training process utilized a learning rate of 2e-05, a total batch size of 256, and was conducted for 1 epoch using mixed-precision training.

Key Training Details

  • Base Model: Gemma-3 270M
  • Learning Rate: 2e-05
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler: Inverse square root with 0.01 warmup steps
  • Epochs: 1
  • Batch Size: 2 (per device), 256 (total effective batch size)
  • Mixed Precision: Native AMP enabled

Performance Metrics

During training, the model achieved a final validation loss of 2.1987 and a perplexity of 9.0416. These metrics reflect its performance as a language model on the evaluation data.

Intended Uses & Limitations

Specific intended uses and limitations are not detailed in the provided information. However, as a fine-tuned base model, it is generally suitable for tasks requiring language understanding and generation, and can serve as a strong foundation for further domain-specific adaptation or research.