Name: AI45Research/AgentDoG-Qwen3-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AI45Research

AgentDoG-Qwen3-4B: Trajectory-Level Agent Safety Guardrail

AgentDoG-Qwen3-4B is a 4 billion parameter model developed by AI45Research, built upon the Qwen3-4B-Instruct-2507 base. It serves as a specialized guardrail framework for autonomous agents, focusing on trajectory-level risk assessment rather than single-step content moderation. This model analyzes the entire execution trace of tool-using agents to detect safety risks that emerge at any point during a multi-step process.

Key Capabilities

Trajectory-Level Monitoring: Evaluates multi-step agent executions, including observations, reasoning, and actions, to identify unsafe behaviors.
Taxonomy-Guided Diagnosis: Provides detailed risk labels (risk source, failure mode, real-world harm) and diagnoses the root cause of specific unsafe actions, tracing them to planning steps or tool selections.
High Performance: Outperforms existing safety guard models like LlamaGuard and Qwen3-Guard on benchmarks such as R-Judge (91.8%), ASSE-Safety (80.4%), and ATBench (92.8%).
Flexible Use Cases: Can function as a benchmark, a risk classifier for agent trajectories, or an integrated guard module within agent systems.

Good for

Developers building autonomous agents who need robust, fine-grained safety monitoring beyond simple content filtering.
Evaluating the safety and security of agent systems by assessing full execution trajectories.
Identifying and understanding the specific causes of unsafe agent behavior through detailed risk categorization.

Overview

AgentDoG-Qwen3-4B: Trajectory-Level Agent Safety Guardrail

Key Capabilities

Good for

Full Model Card (README)