Qwen/WebWorld-32B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Feb 13, 2026License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

Qwen/WebWorld-32B is a 32 billion parameter open-web world model from Qwen, designed for training and evaluating web agents. It is trained on over 1 million real-world web interaction trajectories, supporting long-horizon simulation and multi-format state representations like A11y Tree, HTML, and Markdown. This model excels at predicting next page states and enabling CoT-activated reasoning for transition prediction, demonstrating superior performance in web agent tasks compared to base models and even outperforming GPT-5 as a world model in inference-time lookahead search.

Loading preview...

WebWorld-32B: A World Model for Web Agents

Qwen/WebWorld-32B is a 32 billion parameter model from the WebWorld series, specifically designed as an open-web world model for training and evaluating web agents. It is built upon the Qwen3-32B base model and has been extensively trained on over 1 million real-world web interaction trajectories using a scalable hierarchical data pipeline.

Key Capabilities & Features

  • Long-horizon simulation: Supports web interaction simulations spanning 30+ steps.
  • Multi-format state representations: Handles various web state formats including A11y Tree, HTML, XML, Markdown, and natural language.
  • CoT-activated reasoning: Incorporates Chain-of-Thought reasoning for accurate transition prediction between web states.
  • Cross-domain generalization: Demonstrates strong performance across diverse environments such as code, GUI, and game environments.
  • Enhanced agent performance: Agents trained with WebWorld-synthesized trajectories show significant improvements, achieving +9.9% on MiniWob++ and +10.9% on WebArena compared to base models.
  • Superior world modeling: Outperforms GPT-5 as a world model during inference-time lookahead search.

Performance Highlights

WebWorld-32B achieves high scores in intrinsic evaluations, with a Factuality Score of 71.0 and a Web Turing Score of 45.6, indicating strong functional correctness and perceptual realism. In extrinsic evaluations, it significantly boosts the success rate of web agents.

Ideal Use Cases

  • Developing and training web agents: Provides a robust environment for simulating web interactions.
  • High-fidelity web simulation: Recommended for scenarios requiring accurate and long-horizon web state prediction.
  • Task-specific fine-tuning: Can be further fine-tuned on in-domain trajectories for optimal results in specific web environments.