jsl5710/Shield-Qwen3Guard-Gen-0.6B-Full-FT-CE

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Apr 9, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The jsl5710/Shield-Qwen3Guard-Gen-0.6B-Full-FT-CE is an 0.8 billion parameter Qwen3Guard-Gen model fine-tuned by jsl5710 for LLM safety classification. It is specifically designed to identify safe or unsafe prompts across 48 English dialects, leveraging the DIA-GUARD dataset. This model serves as a dialect-aware safety filter and a student model for knowledge distillation experiments.

Loading preview...

Model Overview

The jsl5710/Shield-Qwen3Guard-Gen-0.6B-Full-FT-CE is a fine-tuned safety classifier model, part of the Shield project. Built upon the Qwen3Guard-Gen-0.6B base model, it has been extensively trained using the DIA-GUARD dataset, which comprises approximately 836,000 records of safe and unsafe prompts across 48 distinct English dialects. This model's primary function is to robustly classify harmful content, making it a specialized tool for enhancing LLM safety.

Key Capabilities

  • Dialect-Aware Safety Classification: Accurately classifies input prompts as safe or unsafe with a focus on diverse English dialects.
  • Knowledge Distillation Component: Designed to function as a student model within knowledge distillation pipelines (e.g., MINILLM, GKD, TED).
  • Research Baseline: Provides a valuable baseline for research into dialect-aware safety mechanisms in large language models.

Performance Highlights

During evaluation on a 2,000-sample subset of the DIA-GUARD validation split, the model achieved an evaluation accuracy of 96.8%. On the full DIA-GUARD holdout test split (181,874 samples), it demonstrated a test accuracy of 0.5432 and a Macro F1 score of 0.3545, with strong performance in identifying 'unsafe' content (F1 of 0.7035 for 'unsafe' class).

Good For

  • Implementing safety filters for LLM applications that need to handle diverse English dialects.
  • Researchers exploring knowledge distillation techniques for safety classifiers.
  • Studies focused on the impact of dialectal variations on LLM safety and bias.