ServiceNow-AI/AprielGuard
ServiceNow-AI/AprielGuard is an 8 billion parameter safeguard model developed by ServiceNow-AI, built on a downscaled variant of the Apriel-1.5-15B Base. It is designed to detect and mitigate both safety risks (e.g., toxicity, bias) and security threats (e.g., prompt injections, jailbreaks) in LLM interactions, unifying these concerns under a single framework. The model handles standalone prompts, multi-turn chats, and agentic AI workflows, offering interpretable outputs through structured reasoning traces. It is optimized for robust and interpretable moderation in LLM-driven systems, supporting a context length of up to 32k tokens.
Loading preview...
What is AprielGuard?
AprielGuard is an 8 billion parameter safeguard model developed by ServiceNow-AI, designed to provide holistic moderation for large language model (LLM) interactions. Unlike traditional moderation tools, it unifies the detection and mitigation of both safety risks (such as toxicity, bias, and misinformation) and security threats (including prompt injections, jailbreaks, and indirect prompt attacks) within a single framework.
Key Capabilities
- Unified Risk Detection: Identifies both safety and adversarial risks using a single model and a shared taxonomy.
- Comprehensive Input Coverage: Processes standalone prompts, multi-turn conversations, and complex agentic AI workflows, including reasoning chains and tool-use sequences.
- Interpretable Outputs: Features a "reasoning mode" that provides structured reasoning traces to justify its predictions, enhancing auditability and human-in-the-loop moderation.
- Compact and Deployable: Optimized for integration into production pipelines and evaluation stacks, offering a lightweight solution.
- Performance: Evaluated across diverse safety and adversarial benchmarks, demonstrating strong performance in identifying various threats.
When to Use AprielGuard
AprielGuard is ideal for applications requiring robust and interpretable moderation in LLM systems. This includes:
- Content moderation and risk classification for LLM-based assistants.
- Real-time model monitoring and observability in production environments.
- Red-teaming and adversarial testing to assess resilience against jailbreaks or injections.
- Safety assessment for agentic workflows, particularly those involving tool-use and API execution.
For faster inference and lower computational cost in real-time deployments, the model can operate in a non-reasoning mode, providing only categorical predictions.