walledai/walledguard-c

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Jul 6, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Gated Cold

Walled AI's WalledGuard-C is a 0.5 billion parameter causal language model developed for content moderation, specifically designed to evaluate text for harmful content. This model excels at identifying unsafe information in prompts and responses, offering a faster inference speed compared to alternatives like Llama Guard 2. It is optimized for English language safety evaluations, providing a robust solution for filtering potentially harmful text.

Loading preview...

WalledGuard-C: A Fast and Effective Content Moderator

WalledGuard-C is the Community version of Walled AI's content moderation models, designed to identify and flag harmful content in text. Developed by Walled AI, this 0.5 billion parameter model focuses on evaluating whether given text asks for or contains unsafe information.

Key Capabilities

  • Harmful Content Detection: Specializes in binary classification to determine if text is 'safe' or 'unsafe'.
  • High Performance: Achieves strong scores on safety benchmarks, including 92.00 on DynamoBench and 87.35 on P-Safety, demonstrating competitive accuracy against larger models like Llama Guard.
  • Fast Inference: Offers significantly faster processing, with approximately 0.1 seconds per sample on A100/A6000 GPUs, making it efficient for real-time moderation tasks.
  • English Language Support: Primarily developed for content moderation in English.

Good For

  • Real-time Content Filtering: Its high inference speed makes it suitable for applications requiring rapid safety checks.
  • Prompt and Response Moderation: Effective for evaluating both user inputs and AI-generated outputs for safety compliance.
  • Integrating Safety Layers: Can be used as a foundational layer for ensuring content safety in various AI applications and platforms.

For more advanced capabilities and the latest scores, users can explore Walled AI's WalledProtect (formerly WalledGuard-A) via their API or the open-source WalledGuard-Edge model.