Alibaba-NLP/WebDancer-32B
Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Jun 23, 2025License:mitArchitecture:Transformer0.1K Open Weights Warm

Alibaba-NLP/WebDancer-32B is a 32 billion parameter agentic search reasoning model developed by Alibaba-NLP, designed for autonomous information seeking. It utilizes a ReAct framework and a four-stage training paradigm to acquire autonomous search and reasoning skills. This model excels at complex web-based tasks, achieving strong performance on benchmarks like GAIA and WebWalkerQA.

Loading preview...

WebDancer-32B: Autonomous Information Seeking Agent

WebDancer-32B is a 32 billion parameter model developed by Alibaba-NLP, specifically engineered for autonomous information seeking and reasoning. It operates as a native agentic search model, leveraging the ReAct framework to enable deep research-like capabilities.

Key Capabilities

  • Autonomous Search and Reasoning: The model is trained to autonomously acquire and apply search and reasoning skills, making it suitable for complex, multi-step information retrieval tasks.
  • Four-Stage Training Paradigm: Its development involved a unique training process including browsing data construction, trajectory sampling, supervised fine-tuning for effective cold start, and reinforcement learning for improved generalization.
  • Data-Centric Approach: Integrates trajectory-level supervision fine-tuning and reinforcement learning (DAPO) to create a scalable pipeline for training agentic systems.
  • Strong Benchmark Performance: Achieves a Pass@3 score of 61.1% on GAIA and 54.6% on WebWalkerQA, indicating its proficiency in handling challenging web-based question answering and task execution.

Good For

  • Autonomous Agents: Ideal for building agents that need to independently search for and process information from the web.
  • Complex Information Retrieval: Suited for tasks requiring multi-step reasoning and interaction with web environments.
  • Research and Analysis Automation: Can be applied to automate aspects of research by autonomously seeking and synthesizing information.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p