ToxicityPrompts/PolyGuard-Qwen-Smol

Warm
Public
0.5B
BF16
32768
1
Feb 18, 2025
License: cc-by-4.0
Hugging Face
Overview

PolyGuard-Qwen-Smol: Multilingual Safety Moderation

PolyGuard-Qwen-Smol is a specialized model designed for multilingual safety moderation of Large Language Model (LLM) interactions. Developed by Priyanshu Kumar et al., it addresses the critical gap in multilingual safety efforts by supporting 17 languages, including Chinese, Czech, English, and Hindi.

Key Capabilities

  • Comprehensive Safety Classification: Identifies three key aspects of LLM interactions:
    • Whether the human user's prompt is harmful.
    • Whether the AI assistant's response is a refusal.
    • Whether the AI assistant's response is harmful.
  • Policy Violation Identification: Pinpoints specific unsafe content categories (S1-S14) if an interaction is deemed harmful, covering areas like violent crimes, hate speech, self-harm, and sexual content.
  • State-of-the-Art Performance: Outperforms existing open-weight and commercial safety classifiers by 5.5% across various safety and toxicity benchmarks.
  • Extensive Training Data: Trained on PolyGuardMix, the largest multilingual safety training corpus to date, comprising 1.91 million samples.
  • High-Quality Evaluation Benchmark: Evaluated using PolyGuardPrompts, a multilingual benchmark with 29,000 samples, ensuring robust assessment.

Good For

  • Developers and organizations requiring robust, multilingual content moderation for their LLM applications.
  • Implementing safety guardrails for LLMs operating in diverse linguistic environments.
  • Analyzing and categorizing harmful content in user prompts and LLM responses across a broad spectrum of languages.