GenPRM/GenPRM-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 3, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

GenPRM/GenPRM-1.5B is a 1.5 billion parameter generative process reward model developed by GenPRM, designed for explicit Chain-of-Thought (CoT) reasoning and code verification. It improves Monte Carlo estimation with Relative Progress Estimation (RPE) and supports test-time scaling for both itself and policy models as verifiers or critics. This model excels in mathematical reasoning and critique tasks, outperforming larger classification-based PRMs in verification and achieving significant performance gains in critique scenarios.

Loading preview...