DeepAgent-QwQ-32B Overview

DeepAgent-QwQ-32B is a 32.8 billion parameter model developed by Xiaoxi Li et al. as an end-to-end deep reasoning agent. It is designed for autonomous thinking, dynamic tool discovery, and action execution within a unified reasoning process. This model addresses the limitations of predefined workflows by maintaining a global task perspective and adaptively utilizing tools.

Key Capabilities

Unified Agentic Reasoning: Operates with a single stream of thought, autonomously reasoning about tasks and discovering necessary tools.
Autonomous Memory Folding: Compresses past interactions into structured episodic, working, and tool memories to manage long-horizon interactions and prevent context explosion.
ToolPO Strategy: Employs an end-to-end reinforcement learning approach specifically tailored for general tool use, leveraging LLM-simulated APIs and fine-grained credit assignment for tool invocation.

Performance and Use Cases

DeepAgent has been extensively evaluated on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE). It consistently outperforms baseline models in both labeled-tool and open-set tool retrieval scenarios. This makes it suitable for applications requiring advanced autonomous reasoning and dynamic tool integration.

Overview

DeepAgent-QwQ-32B Overview

Key Capabilities

Performance and Use Cases

Full Model Card (README)