Name: ZYao720/WebArbiter-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ZYao720

WebArbiter-7B: A Principle-Guided Reasoning Process Reward Model

WebArbiter-7B is a 7.6 billion parameter Process Reward Model (PRM) for web agents, developed by ZYao720 and based on Qwen2.5-7B-Instruct. Unlike traditional scalar or checklist-based reward models, WebArbiter-7B generates structured text outputs, including <State>, <Criteria>, <Analysis>, and <Answer>, to provide auditable reasoning chains for its preference verdicts. This approach allows for dynamic derivation of evaluation principles from user intent and page state, enhancing robustness and generalization across diverse web environments.

Key Capabilities

Structured Reasoning: Provides interpretable, step-level evaluations with explicit reasoning, making its decisions transparent and debuggable.
Superior Performance: Achieves an Avg. BoN Acc of 74.60% on the WEBPRMBENCH benchmark, surpassing GPT-5 by 9.1 points and the prior SOTA WebShepherd-8B by 31 points.
Robust Generalization: Demonstrates state-of-the-art performance across various WebPRMBench environments, including out-of-domain enterprise workflows (WorkArena) and open-world websites (AssistantBench).
Reward-Guided Trajectory Search: Significantly improves success rates in reward-guided trajectory search on WebArena-Lite, outperforming WebShepherd-8B by up to 6.4 points.
Two-Stage Training: Utilizes reasoning distillation from a teacher model followed by Reinforcement Learning with Verifiable Rewards (GRPO) to refine judgments and align with ground-truth correctness.

Good For

Evaluating Web Agent Actions: Determining which of two candidate actions better advances a user's task in a given web state.
Guiding Web Agent Trajectory Search: Serving as a robust reward signal for Best-of-N sampling or tree search mechanisms in web automation.
Interpretable Feedback: Generating detailed, structured justifications for action preferences, aiding in debugging and analysis of web agent behavior.

Overview

WebArbiter-7B: A Principle-Guided Reasoning Process Reward Model

Key Capabilities

Good For

Full Model Card (README)