GenPRM/GenPRM-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 3, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

GenPRM/GenPRM-7B is a 7.6 billion parameter generative process reward model developed by Jian Zhao, Runze Liu, and others, designed for explicit Chain-of-Thought (CoT) reasoning and code verification. It utilizes Relative Progress Estimation (RPE) to improve Monte Carlo estimation and hard labeling. This model excels as both a verifier and a critic, achieving state-of-the-art performance in process judgment and refinement tasks, particularly in mathematical reasoning.

Loading preview...