ZYao720/WebArbiter-4B-Qwen3
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026License:apache-2.0Architecture:Transformer Open Weights Loading

WebArbiter-4B-Qwen3 is a 4 billion parameter reasoning Process Reward Model (PRM) for web agents, developed by ZYao720 and built on Qwen3-4B. It formulates step-level reward modeling as structured text generation, providing interpretable, principle-inducing justifications for web agent actions. This model achieves an Avg. BoN Acc of 72.55% on WebPRMBench, demonstrating strong performance for evaluating and guiding web agent trajectories with roughly half the parameters of larger alternatives.

Loading preview...