Overview
The jsl5710/Shield-Gemma-3-1B-Full-FT-CE is a 1 billion parameter model derived from google/gemma-3-1b-it, specifically fine-tuned by jsl5710 as part of the Shield project. Its core function is to act as a safety classifier for Large Language Models (LLMs), distinguishing between safe and unsafe prompts.
Key Capabilities
- Dialect-Aware Safety Classification: Trained on the extensive DIA-GUARD dataset, which comprises approximately 836,000 records across 48 English dialects, enabling robust classification of harmful content.
- High Test Accuracy: Achieves a test accuracy of 0.9670 and a Macro F1 score of 0.9669 on the DIA-GUARD holdout test split (181,874 samples).
- Knowledge Distillation Student: Designed to serve as a student model in knowledge distillation experiments (MINILLM / GKD / TED), leveraging its specialized safety classification abilities.
- Research Baseline: Provides a strong baseline for academic and applied research into dialect-aware safety mechanisms for LLMs.
Intended Use Cases
- Safety Filtering: Classifying user input prompts as
safe or unsafe to prevent harmful content generation. - LLM Safety Research: Investigating and developing more effective safety measures, particularly concerning linguistic diversity.
- Knowledge Distillation: Acting as a component in larger knowledge distillation pipelines for training smaller, specialized models.
Limitations
- Inherits limitations and biases from its base model, Gemma-3-1B-IT.
- Performance is primarily validated on English dialects; its efficacy on non-English text is not guaranteed.
- Not recommended as the sole safety mechanism in production environments.