lixiaoxi45/DeepAgent-QwQ-32B
DeepAgent-QwQ-32B is a 32.8 billion parameter deep reasoning agent developed by Xiaoxi Li et al. that performs autonomous thinking, tool discovery, and action execution. It is designed to overcome limitations of traditional workflows by maintaining a global perspective and dynamically discovering tools. The model features autonomous memory folding to manage long-horizon interactions and a ToolPO reinforcement learning strategy for general tool use. It excels in general tool-use tasks and downstream applications, outperforming baselines across various benchmarks.
Loading preview...
DeepAgent-QwQ-32B Overview
DeepAgent-QwQ-32B is a 32.8 billion parameter model developed by Xiaoxi Li et al. as an end-to-end deep reasoning agent. It is designed for autonomous thinking, dynamic tool discovery, and action execution within a unified reasoning process. This model addresses the limitations of predefined workflows by maintaining a global task perspective and adaptively utilizing tools.
Key Capabilities
- Unified Agentic Reasoning: Operates with a single stream of thought, autonomously reasoning about tasks and discovering necessary tools.
- Autonomous Memory Folding: Compresses past interactions into structured episodic, working, and tool memories to manage long-horizon interactions and prevent context explosion.
- ToolPO Strategy: Employs an end-to-end reinforcement learning approach specifically tailored for general tool use, leveraging LLM-simulated APIs and fine-grained credit assignment for tool invocation.
Performance and Use Cases
DeepAgent has been extensively evaluated on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE). It consistently outperforms baseline models in both labeled-tool and open-set tool retrieval scenarios. This makes it suitable for applications requiring advanced autonomous reasoning and dynamic tool integration.