DynaGuard-8B: Dynamic Guardrail Model

DynaGuard-8B is an 8 billion parameter model developed by the University of Maryland and Capital One, built upon the Qwen3-8B base architecture. It is specifically designed to act as a dynamic guardrail model, evaluating text against user-defined natural language policies rather than fixed harm categories. This allows for highly customizable and application-specific moderation, such as preventing a customer service bot from issuing refunds or a medical bot from giving unauthorized advice.

Key Features

Dynamic Policies: Enforces arbitrary guardrail policies written in natural language.
Interpretability: Generates natural-language explanations for policy violations, aiding in chatbot recovery and human-in-the-loop refinement.
Dual-Mode Inference: Supports both a Fast Inference mode for direct PASS/FAIL classification and a Chain-of-Thought (CoT) mode for detailed reasoning traces.

Performance

DynaGuard-8B demonstrates state-of-the-art performance on safety and compliance benchmarks, including the novel DynaBench dataset. It surpasses other dedicated guardian models and even strong generalist models like GPT-4o-mini in F1 scores on DynaBench, achieving 72.5 F1 (73.1 F1 with CoT) while maintaining high accuracy on traditional safety benchmarks.

Training

The model was fine-tuned using Supervised Fine-Tuning (SFT) and GRPO on a mixture of the DynaBench dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).

Good For

Implementing flexible and custom content moderation for chatbots.
Ensuring compliance with specific operational or ethical guidelines in AI applications.
Scenarios requiring transparent and explainable policy enforcement.