Name: skylenage-ai/GPRM-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: skylenage-ai

Overview

skylenage-ai/GPRM-4B is a 4 billion parameter Global Perspective Process Reward Model (GPRM) that significantly advances reasoning verification in complex, multi-step tasks. Unlike traditional Process Reward Models (PRMs) that evaluate steps in isolation, GPRM introduces a "Global Perspective" by considering historical evaluations and anticipating future reasoning impacts.

Key Capabilities

History-Aware Evaluation: Explicitly conditions on previous steps and their judgments to maintain context.
Future-Informed Reasoning: Incorporates a look-ahead mechanism to validate steps against subsequent derivations.
4-D Diagnostic Framework: Utilizes a structured evaluation across Look-back, Look-ahead, Self-check, and Goal alignment for robust error localization.
Superior Benchmark Performance: Achieves an overall score of 73.9 on PRMBench and an average F1 score of 74.3 on ProcessBench, outperforming larger models like GPT-4o and o1-mini in specific metrics.
Enhanced Agent Error Detection: Demonstrates improved accuracy in identifying errors in agent-based tasks, scoring 47.0% average accuracy on Agent Error Bench.

Training Strategy

GPRM-4B was developed using a two-stage progressive training pipeline:

Structured SFT: Learned 4-dimensional diagnostic reasoning via targeted error injection, using Qwen3-235B-Instruct for annotation.
GRPO Optimization: Refined its evaluation policy using Group Relative Policy Optimization on hard-mined samples from PRM800K, incorporating complete global context (History + Current + Future).

Good for

Complex Reasoning Tasks: Ideal for applications requiring meticulous step-by-step verification and error localization in long reasoning chains.
Mathematical Problem Solving: Excels in benchmarks like ProcessBench (GSM8K, MATH, OlympiadBench) and downstream test-time search for mathematical accuracy.
Agent Error Detection: Useful for improving the reliability and debugging capabilities of AI agents by identifying and correcting errors in their operational steps.

Overview

Overview

Key Capabilities

Training Strategy

Good for

Full Model Card (README)