tomg-group-umd/DynaGuard-8B

Warm
Public
8B
FP8
32768
License: apache-2.0
Hugging Face
Overview

DynaGuard-8B: Dynamic Guardrail Model

DynaGuard-8B is an 8 billion parameter model developed by the University of Maryland and Capital One, built upon the Qwen3-8B base architecture. It is specifically designed to act as a dynamic guardrail model, evaluating text against user-defined natural language policies rather than fixed harm categories. This allows for highly customizable and application-specific moderation, such as preventing a customer service bot from issuing refunds or a medical bot from giving unauthorized advice.

Key Features

  • Dynamic Policies: Enforces arbitrary guardrail policies written in natural language.
  • Interpretability: Generates natural-language explanations for policy violations, aiding in chatbot recovery and human-in-the-loop refinement.
  • Dual-Mode Inference: Supports both a Fast Inference mode for direct PASS/FAIL classification and a Chain-of-Thought (CoT) mode for detailed reasoning traces.

Performance

DynaGuard-8B demonstrates state-of-the-art performance on safety and compliance benchmarks, including the novel DynaBench dataset. It surpasses other dedicated guardian models and even strong generalist models like GPT-4o-mini in F1 scores on DynaBench, achieving 72.5 F1 (73.1 F1 with CoT) while maintaining high accuracy on traditional safety benchmarks.

Training

The model was fine-tuned using Supervised Fine-Tuning (SFT) and GRPO on a mixture of the DynaBench dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).

Good For

  • Implementing flexible and custom content moderation for chatbots.
  • Ensuring compliance with specific operational or ethical guidelines in AI applications.
  • Scenarios requiring transparent and explainable policy enforcement.