Name: Alibaba-NLP/WebWatcher-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Alibaba-NLP

WebWatcher: A Multimodal Deep Research Agent

WebWatcher, developed by Alibaba-NLP, is a 32 billion parameter multimodal agent specifically engineered for deep research, integrating advanced visual-language reasoning with multi-tool interaction. This model introduces a unified framework to tackle complex information gathering and analysis tasks.

Key Capabilities

Enhanced Visual-Language Reasoning: Combines visual perception with advanced language understanding for in-depth analysis.
Multi-Tool Interaction: Equipped with tools such as Web Image Search, Web Text Search, Webpage Visit, Code Interpreter, and an internal OCR tool for comprehensive information gathering.
Automated Trajectory Generation: Utilizes an automated pipeline to create high-quality, multi-step reasoning trajectories for robust tool-use capabilities and efficient training.
Superior Performance: Significantly outperforms proprietary models like GPT-4o, Gemini2.5-flash, and Qwen2.5-VL-72B across challenging VQA benchmarks.

Performance Highlights

WebWatcher-32B demonstrates leading performance on several benchmarks:

Complex Reasoning (HLE-VL): Achieved a Pass@1 score of 13.6%, surpassing GPT-4o (9.8%).
Information Retrieval (MMSearch): Scored 55.3% Pass@1, outperforming Gemini2.5-flash (43.9%) and GPT-4o (24.1%).
Knowledge-Retrieval Integration (LiveVQA): Achieved 58.7% Pass@1, exceeding Gemini2.5-flash (41.3%) and GPT-4o (34.0%).
Information Optimization and Aggregation (BrowseComp-VL): Dominated with an average score of 27.0%, more than doubling GPT-4o (13.4%) and Gemini2.5-flash (13.0%).

Good for

Deep Research Tasks: Ideal for scenarios requiring extensive information gathering and complex reasoning across visual and textual modalities.
Advanced Visual Search: Excels in real-world visual search benchmarks, providing precise retrieval and robust information aggregation.
Multimodal Agent Development: Serves as a strong baseline for developing and evaluating multimodal agents that require strategic planning and tool use.

Overview

WebWatcher: A Multimodal Deep Research Agent

Key Capabilities

Performance Highlights

Good for

Full Model Card (README)