jsl5710/Shield-Qwen3-1.7B-Full-FT-CE

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 9, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jsl5710/Shield-Qwen3-1.7B-Full-FT-CE model is a 1.7 billion parameter Qwen3-based language model fine-tuned by jsl5710. It is specifically designed as a safety classifier to identify harmful content across 48 English dialects, leveraging the DIA-GUARD dataset. This model excels at classifying prompts as 'safe' or 'unsafe' and is intended for use in LLM safety pipelines, knowledge distillation as a student model, or as a research baseline for dialect-aware safety studies. It offers an 81.85% evaluation accuracy and 78.68% test accuracy on the DIA-GUARD dataset.

Loading preview...

Shield-Qwen3-1.7B-Full-FT-CE: Dialect-Aware Safety Classifier

This model, developed by jsl5710 as part of the Shield project, is a 1.7 billion parameter Qwen3-based language model specifically fine-tuned for LLM safety classification. Its core strength lies in its ability to robustly classify harmful content across 48 diverse English dialects, a capability derived from training on the extensive DIA-GUARD dataset (approximately 836K records of safe/unsafe prompts).

Key Capabilities & Features

  • Dialect-Aware Safety Classification: Identifies prompts as 'safe' or 'unsafe' with high accuracy across a wide range of English dialects.
  • Robust Performance: Achieves an evaluation accuracy of 81.85% and a test accuracy of 78.68% on the DIA-GUARD dataset, demonstrating strong performance in identifying both safe and unsafe content.
  • Knowledge Distillation Component: Designed to function as a student model within knowledge distillation pipelines (e.g., MINILLM, GKD, TED).
  • Research Baseline: Serves as a valuable baseline for research into dialect-informed adversarial guards for LLM safety.

Intended Use Cases

  • Safety Filtering: Implement as a front-line safety filter for user inputs to LLMs, ensuring harmful content is identified.
  • Research & Development: Utilize for studies on dialectal variations in harmful content and the effectiveness of safety mechanisms.
  • Knowledge Distillation: Integrate into larger systems as a student model for transferring safety knowledge.

Limitations

  • Inherits limitations and biases from its base model, Qwen3-1.7B.
  • Performance is primarily validated on English dialects; not guaranteed for non-English text.
  • Should not be the sole safety mechanism in production environments.