Model Overview
ramzanniaz331/gemma3-12b-2048-ds2-sft-v3 is a 12 billion parameter language model, building upon the base model ramzanniaz331/gemma3-12b-2048-v3. This version has undergone extensive fine-tuning across multiple datasets to enhance its capabilities.
Key Fine-tuning Details
The model was fine-tuned using a combination of five distinct datasets:
ramzan_5k_batch_1ramzan_5k_batch_2ramzan_openhermesramzan_metamathramzan_aya_urdu
This diverse training regimen suggests an aim for broad language understanding and generation, potentially covering general conversational, mathematical, and multilingual (Urdu) contexts. The training utilized a learning rate of 5e-06, a total batch size of 64, and ran for 2 epochs, employing a cosine learning rate scheduler with a warmup ratio of 0.03.
Technical Specifications
- Parameters: 12 Billion
- Base Model:
ramzanniaz331/gemma3-12b-2048-v3 - Frameworks: Transformers 4.57.1, Pytorch 2.9.1+cu128, Datasets 4.0.0, Tokenizers 0.22.1
Intended Use Cases
Given its fine-tuning on varied datasets, this model is likely suitable for a range of general-purpose language tasks, including text generation, summarization, and potentially question-answering, especially in domains covered by its training data.