MurrayTom/TS-Guard
MurrayTom/TS-Guard is a 7.6 billion parameter guardrail model specifically designed for step-level tool invocation safety detection. It is trained using reinforcement learning with a multi-task reward scheme focused on agent security. This model excels at identifying harmful user requests and attack vectors within agent-environment interaction logs, detecting unsafe tool invocations before execution, and providing interpretable analysis.
Loading preview...
TS-Guard: Agent Security Guardrail Model
TS-Guard is a 7.6 billion parameter model developed by MurrayTom, engineered to enhance the security of AI agents by focusing on tool invocation safety. Unlike general-purpose LLMs, TS-Guard is specialized in identifying and mitigating security risks at the critical step-level of agent operations.
Key Capabilities
- Step-level Tool Invocation Safety: Detects unsafe tool calls before they are executed, preventing potential harm or misuse.
- Reinforcement Learning for Agent Security: Trained with a multi-task reward scheme specifically tailored for agent security, allowing it to learn complex attack patterns.
- Harmful Request Identification: Capable of identifying malicious user requests and attack vectors embedded within agent-environment interaction logs.
- Interpretable Analysis: Provides clear reasoning and analysis for its safety detections, aiding developers in understanding and addressing vulnerabilities.
Good For
- Securing AI Agents: Ideal for developers building AI agents that interact with external tools or environments, ensuring safe operation.
- Preventing Malicious Tool Use: Acts as a critical safeguard against agents executing harmful or unauthorized actions.
- Enhancing Agent Robustness: Improves the overall security posture of AI systems by proactively identifying and mitigating risks.