AI45Research/AgentDoG-Qwen3-4B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 20, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The AI45Research/AgentDoG-Qwen3-4B is a 4 billion parameter model from the Qwen3 family, developed by AI45Research. It functions as a risk-aware evaluation and guarding framework for autonomous agents, specializing in trajectory-level risk assessment. This model identifies safety risks within an agent's execution trace, providing fine-grained diagnoses of risk sources, failure modes, and real-world harms. It excels at monitoring multi-step agent executions and diagnosing root causes of unsafe behavior, outperforming existing approaches on benchmarks like R-Judge, ASSE-Safety, and ATBench.

Loading preview...

AgentDoG-Qwen3-4B: Trajectory-Level Agent Safety Guardrail

AgentDoG-Qwen3-4B is a 4 billion parameter model developed by AI45Research, built upon the Qwen3-4B-Instruct-2507 base. It serves as a specialized guardrail framework for autonomous agents, focusing on trajectory-level risk assessment rather than single-step content moderation. This model analyzes the entire execution trace of tool-using agents to detect safety risks that emerge at any point during a multi-step process.

Key Capabilities

  • Trajectory-Level Monitoring: Evaluates multi-step agent executions, including observations, reasoning, and actions, to identify unsafe behaviors.
  • Taxonomy-Guided Diagnosis: Provides detailed risk labels (risk source, failure mode, real-world harm) and diagnoses the root cause of specific unsafe actions, tracing them to planning steps or tool selections.
  • High Performance: Outperforms existing safety guard models like LlamaGuard and Qwen3-Guard on benchmarks such as R-Judge (91.8%), ASSE-Safety (80.4%), and ATBench (92.8%).
  • Flexible Use Cases: Can function as a benchmark, a risk classifier for agent trajectories, or an integrated guard module within agent systems.

Good for

  • Developers building autonomous agents who need robust, fine-grained safety monitoring beyond simple content filtering.
  • Evaluating the safety and security of agent systems by assessing full execution trajectories.
  • Identifying and understanding the specific causes of unsafe agent behavior through detailed risk categorization.