Alibaba-NLP/WebDancer-32B

Warm
Public
32B
FP8
32768
Jun 23, 2025
License: mit
Hugging Face
Overview

WebDancer-32B: Autonomous Information Seeking Agent

WebDancer-32B is a 32 billion parameter model developed by Alibaba-NLP, specifically engineered for autonomous information seeking and reasoning. It operates as a native agentic search model, leveraging the ReAct framework to enable deep research-like capabilities.

Key Capabilities

  • Autonomous Search and Reasoning: The model is trained to autonomously acquire and apply search and reasoning skills, making it suitable for complex, multi-step information retrieval tasks.
  • Four-Stage Training Paradigm: Its development involved a unique training process including browsing data construction, trajectory sampling, supervised fine-tuning for effective cold start, and reinforcement learning for improved generalization.
  • Data-Centric Approach: Integrates trajectory-level supervision fine-tuning and reinforcement learning (DAPO) to create a scalable pipeline for training agentic systems.
  • Strong Benchmark Performance: Achieves a Pass@3 score of 61.1% on GAIA and 54.6% on WebWalkerQA, indicating its proficiency in handling challenging web-based question answering and task execution.

Good For

  • Autonomous Agents: Ideal for building agents that need to independently search for and process information from the web.
  • Complex Information Retrieval: Suited for tasks requiring multi-step reasoning and interaction with web environments.
  • Research and Analysis Automation: Can be applied to automate aspects of research by autonomously seeking and synthesizing information.