Name: ZYao720/WebArbiter-3B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ZYao720

WebArbiter-3B: Principle-Guided Reasoning for Web Agents

WebArbiter-3B, developed by ZYao720, is a 3.1 billion parameter Process Reward Model (PRM) specifically designed for web agents. Built upon the Qwen2.5-3B-Instruct architecture, it distinguishes itself by formulating step-level reward modeling as structured text generation, providing interpretable justifications rather than simple scalar scores.

Key Capabilities & Features

Reasoning as Reward: Generates structured outputs including <State>, <Criteria>, <Analysis>, and <Answer>, offering auditable reasoning chains for action preferences.
Principle-Inducing Evaluation: Dynamically derives evaluation principles from user intent and page state, enhancing robustness across diverse web environments.
Two-Stage Training: Utilizes reasoning distillation from an o3 teacher followed by Reinforcement Learning with Verifiable Rewards (GRPO) to refine verdicts and align with ground-truth correctness.
Strong Performance: Achieves an Avg. BoN Acc of 59.06% on the WEBPRMBENCH benchmark, surpassing the previous 3B SOTA WebPRM by 15.5 points and outperforming open-source LLM-as-judge baselines up to 70B parameters.
Efficiency: Despite its compact size, it demonstrates performance superior to larger models like WebShepherd-8B, making it suitable for resource-constrained deployment.

Intended Uses

Evaluating Web Agent Actions: Determines which of two candidate actions better advances a user's task given a web state.
Guiding Trajectory Search: Provides a crucial reward signal for Best-of-N sampling or tree search mechanisms in web agent execution.
Interpretable Feedback: Offers structured, human-readable justifications for action preferences, aiding in debugging and analysis of web agent behavior.

Limitations

WebArbiter-3B operates on text-only accessibility tree representations, potentially missing visual cues. It is currently English-only and may exhibit a safe-action bias or occasional element reference hallucination.

Overview

WebArbiter-3B: Principle-Guided Reasoning for Web Agents

Key Capabilities & Features

Intended Uses

Limitations

Full Model Card (README)