ToxicityPrompts/PolyGuard-Qwen-Smol
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Feb 18, 2025License:cc-by-4.0Architecture:Transformer0.0K Open Weights Warm

PolyGuard-Qwen-Smol is a multilingual safety moderation tool developed by Priyanshu Kumar et al. for safeguarding Large Language Model (LLM) generations across 17 languages. This model is trained on PolyGuardMix, the largest multilingual safety training corpus to date with 1.91M samples, and excels at classifying prompt harmfulness, response harmfulness, and response refusal. It outperforms existing state-of-the-art open-weight and commercial safety classifiers by 5.5%, making it ideal for robust, multilingual content moderation.

Loading preview...

PolyGuard-Qwen-Smol: Multilingual Safety Moderation

PolyGuard-Qwen-Smol is a specialized model designed for multilingual safety moderation of Large Language Model (LLM) interactions. Developed by Priyanshu Kumar et al., it addresses the critical gap in multilingual safety efforts by supporting 17 languages, including Chinese, Czech, English, and Hindi.

Key Capabilities

  • Comprehensive Safety Classification: Identifies three key aspects of LLM interactions:
    • Whether the human user's prompt is harmful.
    • Whether the AI assistant's response is a refusal.
    • Whether the AI assistant's response is harmful.
  • Policy Violation Identification: Pinpoints specific unsafe content categories (S1-S14) if an interaction is deemed harmful, covering areas like violent crimes, hate speech, self-harm, and sexual content.
  • State-of-the-Art Performance: Outperforms existing open-weight and commercial safety classifiers by 5.5% across various safety and toxicity benchmarks.
  • Extensive Training Data: Trained on PolyGuardMix, the largest multilingual safety training corpus to date, comprising 1.91 million samples.
  • High-Quality Evaluation Benchmark: Evaluated using PolyGuardPrompts, a multilingual benchmark with 29,000 samples, ensuring robust assessment.

Good For

  • Developers and organizations requiring robust, multilingual content moderation for their LLM applications.
  • Implementing safety guardrails for LLMs operating in diverse linguistic environments.
  • Analyzing and categorizing harmful content in user prompts and LLM responses across a broad spectrum of languages.