tzchen07/ShieldGemma-2B-SFT-X9c
ShieldGemma-2B-SFT-X9c by tzchen07 is a 2.6 billion parameter instruction-tuned causal language model, fine-tuned from jxm/shieldgemma-2b. This model leverages a 8192 token context length and is optimized through supervised fine-tuning on the v1_6_plus_v1_6b_plus_v1_6c dataset. It is designed for general language understanding and generation tasks, building upon the Gemma architecture.
Loading preview...
Model Overview
ShieldGemma-2B-SFT-X9c is a 2.6 billion parameter language model developed by tzchen07. It is a supervised fine-tuned (SFT) version of the jxm/shieldgemma-2b base model, indicating an optimization for instruction-following and conversational tasks. The model was trained using a learning rate of 5e-06, a batch size of 4, and a cosine learning rate scheduler over 2 epochs.
Key Training Details
- Base Model:
jxm/shieldgemma-2b - Dataset: Fine-tuned on the
v1_6_plus_v1_6b_plus_v1_6cdataset, suggesting a focus on diverse conversational or instructional data. - Parameters: 2.6 billion
- Context Length: 8192 tokens
- Optimizer: AdamW with specific beta and epsilon values.
- Frameworks: Utilizes Transformers 4.57.1, Pytorch 2.4.0+cu121, and Datasets 3.6.0.
Intended Use Cases
While specific intended uses are not detailed in the provided README, as an instruction-tuned model, ShieldGemma-2B-SFT-X9c is generally suitable for tasks requiring:
- Following instructions to generate text.
- Engaging in conversational AI.
- General text generation and understanding where a 2.6B parameter model is appropriate for resource constraints.
Further evaluation would be needed to determine its specific strengths and limitations across various benchmarks.