jsl5710/Shield-Gemma-3-1B-Full-FT-CE
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 9, 2026License:gemmaArchitecture:Transformer Warm
The jsl5710/Shield-Gemma-3-1B-Full-FT-CE is a 1 billion parameter Gemma-3-1B-IT based model fine-tuned by jsl5710. This model functions as a specialized safety classifier, trained on the DIA-GUARD dataset to identify harmful content across 48 English dialects. Its primary strength lies in robustly classifying prompts as 'safe' or 'unsafe', making it suitable for filtering and research in dialect-aware LLM safety.
Loading preview...
Overview
The jsl5710/Shield-Gemma-3-1B-Full-FT-CE is a 1 billion parameter model derived from google/gemma-3-1b-it, specifically fine-tuned by jsl5710 as part of the Shield project. Its core function is to act as a safety classifier for Large Language Models (LLMs), distinguishing between safe and unsafe prompts.
Key Capabilities
- Dialect-Aware Safety Classification: Trained on the extensive DIA-GUARD dataset, which comprises approximately 836,000 records across 48 English dialects, enabling robust classification of harmful content.
- High Test Accuracy: Achieves a test accuracy of 0.9670 and a Macro F1 score of 0.9669 on the DIA-GUARD holdout test split (181,874 samples).
- Knowledge Distillation Student: Designed to serve as a student model in knowledge distillation experiments (MINILLM / GKD / TED), leveraging its specialized safety classification abilities.
- Research Baseline: Provides a strong baseline for academic and applied research into dialect-aware safety mechanisms for LLMs.
Intended Use Cases
- Safety Filtering: Classifying user input prompts as
safeorunsafeto prevent harmful content generation. - LLM Safety Research: Investigating and developing more effective safety measures, particularly concerning linguistic diversity.
- Knowledge Distillation: Acting as a component in larger knowledge distillation pipelines for training smaller, specialized models.
Limitations
- Inherits limitations and biases from its base model, Gemma-3-1B-IT.
- Performance is primarily validated on English dialects; its efficacy on non-English text is not guaranteed.
- Not recommended as the sole safety mechanism in production environments.