tzchen07/SG_X9e
tzchen07/SG_X9e is a 2.6 billion parameter language model fine-tuned from jxm/shieldgemma-2b. This model was trained across five distinct datasets (v1.6, v1.6b, v1.6c, v1.6d, and v1.6e) to enhance its capabilities. With an 8192-token context length, it is suitable for tasks requiring processing of longer sequences.
Loading preview...
Model Overview
tzchen07/SG_X9e is a 2.6 billion parameter language model, building upon the base architecture of jxm/shieldgemma-2b. It has been fine-tuned using a series of five specific datasets: v1.6, v1.6b, v1.6c, v1.6d, and v1.6e. This fine-tuning process aimed to adapt the model for particular applications, though specific details on its enhanced capabilities are not provided in the original documentation.
Training Details
The model was trained with a learning rate of 5e-06, a batch size of 4, and a gradient accumulation of 16, resulting in an effective total batch size of 64. The AdamW_Torch_Fused optimizer was utilized, and the training spanned 2 epochs with a cosine learning rate scheduler and a 0.1 warmup ratio. The training environment included Transformers 4.57.1, Pytorch 2.10.0+cu129, Datasets 4.0.0, and Tokenizers 0.22.2.
Intended Use
While specific intended uses and limitations are not detailed, its foundation on the ShieldGemma architecture and fine-tuning on multiple datasets suggest potential for specialized language understanding and generation tasks. Developers should consider its 2.6B parameter size and 8192-token context length for applications where these specifications are suitable.