WhitzardAgent/Thought-Aligner-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 2, 2025Architecture:Transformer0.0K Warm

Thought-Aligner-7B by WhitzardAgent, Shanghai Innovation Institute (SII), and Fudan University is a 7.6 billion parameter model fine-tuned from Qwen2.5-7B-Instruct, designed as a lightweight defense module for enhancing agent behavioral safety. It performs real-time causal intervention on an agent's internal reasoning (thoughts) to correct unsafe patterns before actions are executed, without interrupting the execution flow. This model excels at mitigating risky decisions, unsafe tool use, and privacy threats, achieving over 90% agent safety across various benchmarks and validated in real-world deployments. With a 32768 token context length, it offers a plug-and-play architecture for diverse LLM backends and agent frameworks.

Loading preview...

Overview of Thought-Aligner-7B

Thought-Aligner-7B, developed by WhitzardAgent, Shanghai Innovation Institute (SII), and Fudan University, is a 7.6 billion parameter model fine-tuned from Qwen2.5-7B-Instruct. It functions as a lightweight, add-on defense module specifically designed to enhance the behavioral safety of tool-using agents. Unlike traditional safety mechanisms that intervene at the output stage, Thought-Aligner operates by performing real-time causal intervention on an agent's internal reasoning process (thoughts), correcting potentially unsafe thoughts before actions are executed.

Key Capabilities and Features

  • Thought-level Correction: Mitigates high-risk reasoning patterns directly at the thought stage, preventing unsafe actions without interrupting the agent's execution flow.
  • High Safety Gains: Achieves over 90% overall agent safety across benchmarks like ToolEmu, Agent-SafetyBench, AgentHarm, AgentDojo, and InjecAgent, outperforming other defenses by approximately 23% on average.
  • Real-world Validation: Demonstrated effectiveness in practical applications through validation on the OpenClaw real-world deployment.
  • Low-latency and Low-intrusion: Designed for smooth integration into existing reasoning and execution pipelines, addressing risks at their source while preserving utility.
  • Plug-and-Play Architecture: Easily adaptable to various LLM backends and agent frameworks with minimal deployment overhead.
  • Efficiency: Available in a 7.6B parameter variant, with a 1.5B variant achieving per-thought repair latency below 100 ms.

Why Choose Thought-Aligner-7B?

Thought-Aligner-7B introduces a new paradigm for agent safety defense by focusing on pre-execution thought correction. It is ideal for developers looking to:

  • Improve agent behavioral safety in applications involving tool use.
  • Reduce risky decisions and privacy-threatening behaviors in AI agents.
  • Integrate a low-latency, utility-preserving safety mechanism into existing agent systems.
  • Deploy a solution validated not only on benchmarks but also in real-world control loops.