jsl5710/Shield-Qwen3-1.7B-Full-FT-CE
The jsl5710/Shield-Qwen3-1.7B-Full-FT-CE model is a 1.7 billion parameter Qwen3-based language model fine-tuned by jsl5710. It is specifically designed as a safety classifier to identify harmful content across 48 English dialects, leveraging the DIA-GUARD dataset. This model excels at classifying prompts as 'safe' or 'unsafe' and is intended for use in LLM safety pipelines, knowledge distillation as a student model, or as a research baseline for dialect-aware safety studies. It offers an 81.85% evaluation accuracy and 78.68% test accuracy on the DIA-GUARD dataset.
Loading preview...
Shield-Qwen3-1.7B-Full-FT-CE: Dialect-Aware Safety Classifier
This model, developed by jsl5710 as part of the Shield project, is a 1.7 billion parameter Qwen3-based language model specifically fine-tuned for LLM safety classification. Its core strength lies in its ability to robustly classify harmful content across 48 diverse English dialects, a capability derived from training on the extensive DIA-GUARD dataset (approximately 836K records of safe/unsafe prompts).
Key Capabilities & Features
- Dialect-Aware Safety Classification: Identifies prompts as 'safe' or 'unsafe' with high accuracy across a wide range of English dialects.
- Robust Performance: Achieves an evaluation accuracy of 81.85% and a test accuracy of 78.68% on the DIA-GUARD dataset, demonstrating strong performance in identifying both safe and unsafe content.
- Knowledge Distillation Component: Designed to function as a student model within knowledge distillation pipelines (e.g., MINILLM, GKD, TED).
- Research Baseline: Serves as a valuable baseline for research into dialect-informed adversarial guards for LLM safety.
Intended Use Cases
- Safety Filtering: Implement as a front-line safety filter for user inputs to LLMs, ensuring harmful content is identified.
- Research & Development: Utilize for studies on dialectal variations in harmful content and the effectiveness of safety mechanisms.
- Knowledge Distillation: Integrate into larger systems as a student model for transferring safety knowledge.
Limitations
- Inherits limitations and biases from its base model, Qwen3-1.7B.
- Performance is primarily validated on English dialects; not guaranteed for non-English text.
- Should not be the sole safety mechanism in production environments.