ramzanniaz331/gemma3-12b-2048-ds2-sft-v3

VISIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Dec 31, 2025License:otherArchitecture:Transformer Cold

The ramzanniaz331/gemma3-12b-2048-ds2-sft-v3 model is a 12 billion parameter language model, fine-tuned from ramzanniaz331/gemma3-12b-2048-v3. This model was fine-tuned on a diverse set of datasets including ramzan_5k_batch_1, ramzan_5k_batch_2, ramzan_openhermes, ramzan_metamath, and ramzan_aya_urdu. It is designed for general language generation tasks, leveraging its extensive fine-tuning for broad applicability.

Loading preview...

Model Overview

ramzanniaz331/gemma3-12b-2048-ds2-sft-v3 is a 12 billion parameter language model, building upon the base model ramzanniaz331/gemma3-12b-2048-v3. This version has undergone extensive fine-tuning across multiple datasets to enhance its capabilities.

Key Fine-tuning Details

The model was fine-tuned using a combination of five distinct datasets:

  • ramzan_5k_batch_1
  • ramzan_5k_batch_2
  • ramzan_openhermes
  • ramzan_metamath
  • ramzan_aya_urdu

This diverse training regimen suggests an aim for broad language understanding and generation, potentially covering general conversational, mathematical, and multilingual (Urdu) contexts. The training utilized a learning rate of 5e-06, a total batch size of 64, and ran for 2 epochs, employing a cosine learning rate scheduler with a warmup ratio of 0.03.

Technical Specifications

  • Parameters: 12 Billion
  • Base Model: ramzanniaz331/gemma3-12b-2048-v3
  • Frameworks: Transformers 4.57.1, Pytorch 2.9.1+cu128, Datasets 4.0.0, Tokenizers 0.22.1

Intended Use Cases

Given its fine-tuning on varied datasets, this model is likely suitable for a range of general-purpose language tasks, including text generation, summarization, and potentially question-answering, especially in domains covered by its training data.