Name: ToxicityPrompts/PolyGuard-Qwen API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ToxicityPrompts

PolyGuard-Qwen: Multilingual Safety Moderation

PolyGuard-Qwen is a 7.6 billion parameter model developed to address the critical need for truly multilingual safety moderation in Large Language Models (LLMs). Unlike many existing solutions that focus on a limited set of languages, PolyGuard-Qwen supports 17 languages, including Chinese, Czech, English, and Hindi, significantly broadening the scope of safety capabilities.

Key Capabilities

Multilingual Safety Classification: Trained on PolyGuardMix, the largest multilingual safety training corpus to date with 1.91 million samples across 17 languages.
Comprehensive Harm Detection: Classifies interactions based on three criteria: prompt harmfulness, AI assistant response harmfulness, and AI assistant response refusal.
Policy Violation Identification: Identifies specific unsafe content categories (e.g., Violent Crimes, Hate, Self-Harm, Sexual Content) when an interaction is deemed unsafe.
State-of-the-Art Performance: Outperforms existing open-weight and commercial safety classifiers by 5.5% on various safety and toxicity benchmarks.
Robust Evaluation: Utilizes PolyGuardPrompts, a high-quality multilingual benchmark with 29,000 samples, for rigorous evaluation of safety guardrails.

Good For

Developers and organizations requiring robust, multilingual safety guardrails for their LLM applications.
Moderating user-LLM interactions across a diverse linguistic user base.
Identifying and categorizing harmful content and refusals in LLM outputs.
Enhancing the safety and trustworthiness of LLMs in global deployments.