TheHassanSaud/ramzan_sft_gemma3_with_updated_templat
TheHassanSaud/ramzan_sft_gemma3_with_updated_templat is a 12 billion parameter language model, fine-tuned by TheHassanSaud from ramzanniaz331/gemma3-12b-2048-v3. This model leverages a 32768 token context length and has been specialized through supervised fine-tuning on a diverse set of datasets including ramzan_5k_batch_1, ramzan_5k_batch_2, ramzan_openhermes, ramzan_metamath, and ramzan_aya_urdu. It is designed for general language generation tasks, with a focus on leveraging the combined knowledge from its training data.
Loading preview...
Model Overview
TheHassanSaud/ramzan_sft_gemma3_with_updated_templat is a 12 billion parameter language model, building upon the ramzanniaz331/gemma3-12b-2048-v3 base model. It has undergone supervised fine-tuning (SFT) using a comprehensive set of datasets, including ramzan_5k_batch_1, ramzan_5k_batch_2, ramzan_openhermes, ramzan_metamath, and ramzan_aya_urdu.
Key Training Details
The model was trained with the following hyperparameters:
- Learning Rate: 5e-06
- Batch Size: 1 (train), 8 (eval) with 8 gradient accumulation steps, resulting in a total effective batch size of 64.
- Optimizer: ADAMW_TORCH_FUSED with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.03 warmup ratio.
- Epochs: 2.0
Intended Use Cases
Given its fine-tuning on a variety of datasets, this model is suitable for:
- General text generation and understanding tasks.
- Applications requiring knowledge from the specific datasets it was trained on, such as mathematical reasoning (from
ramzan_metamath) and potentially multilingual understanding (fromramzan_aya_urdu).
Further details on specific intended uses, limitations, and comprehensive evaluation data are not provided in the current model card.