TheHassanSaud/ramzan_sft_gemma3_with_updated_templat

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Jan 11, 2026License:otherArchitecture:Transformer Cold

TheHassanSaud/ramzan_sft_gemma3_with_updated_templat is a 12 billion parameter language model, fine-tuned by TheHassanSaud from ramzanniaz331/gemma3-12b-2048-v3. This model leverages a 32768 token context length and has been specialized through supervised fine-tuning on a diverse set of datasets including ramzan_5k_batch_1, ramzan_5k_batch_2, ramzan_openhermes, ramzan_metamath, and ramzan_aya_urdu. It is designed for general language generation tasks, with a focus on leveraging the combined knowledge from its training data.

Loading preview...

Model Overview

TheHassanSaud/ramzan_sft_gemma3_with_updated_templat is a 12 billion parameter language model, building upon the ramzanniaz331/gemma3-12b-2048-v3 base model. It has undergone supervised fine-tuning (SFT) using a comprehensive set of datasets, including ramzan_5k_batch_1, ramzan_5k_batch_2, ramzan_openhermes, ramzan_metamath, and ramzan_aya_urdu.

Key Training Details

The model was trained with the following hyperparameters:

  • Learning Rate: 5e-06
  • Batch Size: 1 (train), 8 (eval) with 8 gradient accumulation steps, resulting in a total effective batch size of 64.
  • Optimizer: ADAMW_TORCH_FUSED with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.03 warmup ratio.
  • Epochs: 2.0

Intended Use Cases

Given its fine-tuning on a variety of datasets, this model is suitable for:

  • General text generation and understanding tasks.
  • Applications requiring knowledge from the specific datasets it was trained on, such as mathematical reasoning (from ramzan_metamath) and potentially multilingual understanding (from ramzan_aya_urdu).

Further details on specific intended uses, limitations, and comprehensive evaluation data are not provided in the current model card.