WalledGuard-C: A Fast and Effective Content Moderator
WalledGuard-C is the Community version of Walled AI's content moderation models, designed to identify and flag harmful content in text. Developed by Walled AI, this 0.5 billion parameter model focuses on evaluating whether given text asks for or contains unsafe information.
Key Capabilities
- Harmful Content Detection: Specializes in binary classification to determine if text is 'safe' or 'unsafe'.
- High Performance: Achieves strong scores on safety benchmarks, including 92.00 on DynamoBench and 87.35 on P-Safety, demonstrating competitive accuracy against larger models like Llama Guard.
- Fast Inference: Offers significantly faster processing, with approximately 0.1 seconds per sample on A100/A6000 GPUs, making it efficient for real-time moderation tasks.
- English Language Support: Primarily developed for content moderation in English.
Good For
- Real-time Content Filtering: Its high inference speed makes it suitable for applications requiring rapid safety checks.
- Prompt and Response Moderation: Effective for evaluating both user inputs and AI-generated outputs for safety compliance.
- Integrating Safety Layers: Can be used as a foundational layer for ensuring content safety in various AI applications and platforms.
For more advanced capabilities and the latest scores, users can explore Walled AI's WalledProtect (formerly WalledGuard-A) via their API or the open-source WalledGuard-Edge model.