AI45Research/AgentDoG-Qwen3-4B

Warm
Public
4B
BF16
32768
1
Jan 20, 2026
License: apache-2.0
Hugging Face
Overview

AgentDoG-Qwen3-4B: Trajectory-Level Agent Safety Guardrail

AgentDoG-Qwen3-4B is a 4 billion parameter model developed by AI45Research, built upon the Qwen3-4B-Instruct-2507 base. It serves as a specialized guardrail framework for autonomous agents, focusing on trajectory-level risk assessment rather than single-step content moderation. This model analyzes the entire execution trace of tool-using agents to detect safety risks that emerge at any point during a multi-step process.

Key Capabilities

  • Trajectory-Level Monitoring: Evaluates multi-step agent executions, including observations, reasoning, and actions, to identify unsafe behaviors.
  • Taxonomy-Guided Diagnosis: Provides detailed risk labels (risk source, failure mode, real-world harm) and diagnoses the root cause of specific unsafe actions, tracing them to planning steps or tool selections.
  • High Performance: Outperforms existing safety guard models like LlamaGuard and Qwen3-Guard on benchmarks such as R-Judge (91.8%), ASSE-Safety (80.4%), and ATBench (92.8%).
  • Flexible Use Cases: Can function as a benchmark, a risk classifier for agent trajectories, or an integrated guard module within agent systems.

Good for

  • Developers building autonomous agents who need robust, fine-grained safety monitoring beyond simple content filtering.
  • Evaluating the safety and security of agent systems by assessing full execution trajectories.
  • Identifying and understanding the specific causes of unsafe agent behavior through detailed risk categorization.