Name: jsl5710/Shield-Qwen3Guard-Gen-0.6B-Full-FT-CE API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jsl5710

Model Overview

The jsl5710/Shield-Qwen3Guard-Gen-0.6B-Full-FT-CE is a fine-tuned safety classifier model, part of the Shield project. Built upon the Qwen3Guard-Gen-0.6B base model, it has been extensively trained using the DIA-GUARD dataset, which comprises approximately 836,000 records of safe and unsafe prompts across 48 distinct English dialects. This model's primary function is to robustly classify harmful content, making it a specialized tool for enhancing LLM safety.

Key Capabilities

Dialect-Aware Safety Classification: Accurately classifies input prompts as safe or unsafe with a focus on diverse English dialects.
Knowledge Distillation Component: Designed to function as a student model within knowledge distillation pipelines (e.g., MINILLM, GKD, TED).
Research Baseline: Provides a valuable baseline for research into dialect-aware safety mechanisms in large language models.

Performance Highlights

During evaluation on a 2,000-sample subset of the DIA-GUARD validation split, the model achieved an evaluation accuracy of 96.8%. On the full DIA-GUARD holdout test split (181,874 samples), it demonstrated a test accuracy of 0.5432 and a Macro F1 score of 0.3545, with strong performance in identifying 'unsafe' content (F1 of 0.7035 for 'unsafe' class).

Good For

Implementing safety filters for LLM applications that need to handle diverse English dialects.
Researchers exploring knowledge distillation techniques for safety classifiers.
Studies focused on the impact of dialectal variations on LLM safety and bias.

Overview

Model Overview

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)