Name: GenPRM/GenPRM-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: GenPRM

GenPRM-7B: Generative Process Reward Model

GenPRM-7B is a 7.6 billion parameter generative process reward model (PRM) that introduces several innovations for enhanced reasoning and verification. It performs explicit Chain-of-Thought (CoT) reasoning and code verification before making process judgments, and improves Monte Carlo estimation and hard labeling through Relative Progress Estimation (RPE).

Key Capabilities

State-of-the-Art Verification: As a verifier, GenPRM-7B outperforms classification-based PRMs of comparable size and even surpasses larger models like Qwen2.5-Math-PRM-72B through test-time scaling.
Superior Critique: In its role as a critic, the model demonstrates significant performance gains, achieving 3.4x greater improvement than DeepSeekR1-Distill-Qwen-7B after three refinement iterations.
Test-Time Scaling: Supports parallel test-time scaling with majority voting for GenPRM itself, and acts as a verifier or critic for policy models.
Mathematical Reasoning: Trained on 23K SFT data, including the GenPRM-MATH-Data dataset, using DeepSeek-R1-Distill series as base models, making it particularly adept at mathematical problem-solving and critique.

Good For

Automated Code Verification: Leveraging its explicit code verification capabilities.
Process Supervision: Providing detailed, step-by-step feedback and judgment on reasoning processes.
Improving LLM Outputs: Acting as a critic to refine and enhance the quality of other language models' generated content, especially in complex reasoning tasks.
Mathematical Problem Solving: Excelling in tasks requiring detailed mathematical reasoning and solution critique.

Overview

GenPRM-7B: Generative Process Reward Model

Key Capabilities

Good For

Full Model Card (README)