tzchen07/ShieldGemma-2B-SFT-X9c

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:May 26, 2026License:otherArchitecture:Transformer Warm

ShieldGemma-2B-SFT-X9c by tzchen07 is a 2.6 billion parameter instruction-tuned causal language model, fine-tuned from jxm/shieldgemma-2b. This model leverages a 8192 token context length and is optimized through supervised fine-tuning on the v1_6_plus_v1_6b_plus_v1_6c dataset. It is designed for general language understanding and generation tasks, building upon the Gemma architecture.

Loading preview...

Model Overview

ShieldGemma-2B-SFT-X9c is a 2.6 billion parameter language model developed by tzchen07. It is a supervised fine-tuned (SFT) version of the jxm/shieldgemma-2b base model, indicating an optimization for instruction-following and conversational tasks. The model was trained using a learning rate of 5e-06, a batch size of 4, and a cosine learning rate scheduler over 2 epochs.

Key Training Details

  • Base Model: jxm/shieldgemma-2b
  • Dataset: Fine-tuned on the v1_6_plus_v1_6b_plus_v1_6c dataset, suggesting a focus on diverse conversational or instructional data.
  • Parameters: 2.6 billion
  • Context Length: 8192 tokens
  • Optimizer: AdamW with specific beta and epsilon values.
  • Frameworks: Utilizes Transformers 4.57.1, Pytorch 2.4.0+cu121, and Datasets 3.6.0.

Intended Use Cases

While specific intended uses are not detailed in the provided README, as an instruction-tuned model, ShieldGemma-2B-SFT-X9c is generally suitable for tasks requiring:

  • Following instructions to generate text.
  • Engaging in conversational AI.
  • General text generation and understanding where a 2.6B parameter model is appropriate for resource constraints.

Further evaluation would be needed to determine its specific strengths and limitations across various benchmarks.