Name: jsl5710/Shield-Qwen3-1.7B-Full-FT-CE API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jsl5710

Shield-Qwen3-1.7B-Full-FT-CE: Dialect-Aware Safety Classifier

This model, developed by jsl5710 as part of the Shield project, is a 1.7 billion parameter Qwen3-based language model specifically fine-tuned for LLM safety classification. Its core strength lies in its ability to robustly classify harmful content across 48 diverse English dialects, a capability derived from training on the extensive DIA-GUARD dataset (approximately 836K records of safe/unsafe prompts).

Key Capabilities & Features

Dialect-Aware Safety Classification: Identifies prompts as 'safe' or 'unsafe' with high accuracy across a wide range of English dialects.
Robust Performance: Achieves an evaluation accuracy of 81.85% and a test accuracy of 78.68% on the DIA-GUARD dataset, demonstrating strong performance in identifying both safe and unsafe content.
Knowledge Distillation Component: Designed to function as a student model within knowledge distillation pipelines (e.g., MINILLM, GKD, TED).
Research Baseline: Serves as a valuable baseline for research into dialect-informed adversarial guards for LLM safety.

Intended Use Cases

Safety Filtering: Implement as a front-line safety filter for user inputs to LLMs, ensuring harmful content is identified.
Research & Development: Utilize for studies on dialectal variations in harmful content and the effectiveness of safety mechanisms.
Knowledge Distillation: Integrate into larger systems as a student model for transferring safety knowledge.

Limitations

Inherits limitations and biases from its base model, Qwen3-1.7B.
Performance is primarily validated on English dialects; not guaranteed for non-English text.
Should not be the sole safety mechanism in production environments.

Overview

Shield-Qwen3-1.7B-Full-FT-CE: Dialect-Aware Safety Classifier

Key Capabilities & Features

Intended Use Cases

Limitations

Full Model Card (README)