DynaGuard-8B: Dynamic Guardrail Model
DynaGuard-8B is an 8 billion parameter model developed by the University of Maryland and Capital One, built upon the Qwen3-8B base architecture. It is specifically designed to act as a dynamic guardrail model, evaluating text against user-defined natural language policies rather than fixed harm categories. This allows for highly customizable and application-specific moderation, such as preventing a customer service bot from issuing refunds or a medical bot from giving unauthorized advice.
Key Features
- Dynamic Policies: Enforces arbitrary guardrail policies written in natural language.
- Interpretability: Generates natural-language explanations for policy violations, aiding in chatbot recovery and human-in-the-loop refinement.
- Dual-Mode Inference: Supports both a Fast Inference mode for direct
PASS/FAILclassification and a Chain-of-Thought (CoT) mode for detailed reasoning traces.
Performance
DynaGuard-8B demonstrates state-of-the-art performance on safety and compliance benchmarks, including the novel DynaBench dataset. It surpasses other dedicated guardian models and even strong generalist models like GPT-4o-mini in F1 scores on DynaBench, achieving 72.5 F1 (73.1 F1 with CoT) while maintaining high accuracy on traditional safety benchmarks.
Training
The model was fine-tuned using Supervised Fine-Tuning (SFT) and GRPO on a mixture of the DynaBench dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
Good For
- Implementing flexible and custom content moderation for chatbots.
- Ensuring compliance with specific operational or ethical guidelines in AI applications.
- Scenarios requiring transparent and explainable policy enforcement.