GenPRM/GenPRM-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 3, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

GenPRM/GenPRM-7B is a 7.6 billion parameter generative process reward model developed by Jian Zhao, Runze Liu, and others, designed for explicit Chain-of-Thought (CoT) reasoning and code verification. It utilizes Relative Progress Estimation (RPE) to improve Monte Carlo estimation and hard labeling. This model excels as both a verifier and a critic, achieving state-of-the-art performance in process judgment and refinement tasks, particularly in mathematical reasoning.

Loading preview...

GenPRM-7B: Generative Process Reward Model

GenPRM-7B is a 7.6 billion parameter generative process reward model (PRM) that introduces several innovations for enhanced reasoning and verification. It performs explicit Chain-of-Thought (CoT) reasoning and code verification before making process judgments, and improves Monte Carlo estimation and hard labeling through Relative Progress Estimation (RPE).

Key Capabilities

  • State-of-the-Art Verification: As a verifier, GenPRM-7B outperforms classification-based PRMs of comparable size and even surpasses larger models like Qwen2.5-Math-PRM-72B through test-time scaling.
  • Superior Critique: In its role as a critic, the model demonstrates significant performance gains, achieving 3.4x greater improvement than DeepSeekR1-Distill-Qwen-7B after three refinement iterations.
  • Test-Time Scaling: Supports parallel test-time scaling with majority voting for GenPRM itself, and acts as a verifier or critic for policy models.
  • Mathematical Reasoning: Trained on 23K SFT data, including the GenPRM-MATH-Data dataset, using DeepSeek-R1-Distill series as base models, making it particularly adept at mathematical problem-solving and critique.

Good For

  • Automated Code Verification: Leveraging its explicit code verification capabilities.
  • Process Supervision: Providing detailed, step-by-step feedback and judgment on reasoning processes.
  • Improving LLM Outputs: Acting as a critic to refine and enhance the quality of other language models' generated content, especially in complex reasoning tasks.
  • Mathematical Problem Solving: Excelling in tasks requiring detailed mathematical reasoning and solution critique.