Overview
WebExplorer-8B: A Long-Horizon Web Agent
WebExplorer-8B, developed by hkust-nlp, is an 8 billion parameter web agent model specifically engineered for complex information-seeking and long-horizon reasoning tasks. Built upon the Qwen3-8B base model, it leverages a unique data generation approach involving model-based exploration and iterative query evolution to create challenging, multi-step reasoning tasks.
Key Capabilities
- Long-horizon Reasoning: Supports an extensive 128K context length and up to 100 tool calling turns, enabling deep, multi-step problem-solving.
- Advanced Tool Utilization: Proficiently uses both search and browse tools for effective web interaction and information extraction.
- State-of-the-Art Performance: Achieves leading results among models under 10 billion parameters on benchmarks like BrowseComp-en/zh, WebWalkerQA, and FRAMES, often outperforming larger models like WebSailor-72B in specific tasks.
- Robust Training: Developed through a two-phase approach: Supervised Fine-tuning (SFT) with high-quality trajectories, followed by Reinforcement Learning (RL) using the GRPO algorithm and progressive context expansion.
Good For
- Complex Information Retrieval: Excels at tasks requiring multiple search queries and web page interactions to synthesize answers.
- Automated Web Navigation: Ideal for applications needing an agent to autonomously browse and extract specific information from diverse online sources.
- Long-Term Problem Solving: Its ability to handle up to 100 tool turns makes it suitable for intricate problems that require extended interaction with web environments.