Name: dongboklee/gPRM-14B-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dongboklee

Overview

dongboklee/gPRM-14B-merged is a 14.8 billion parameter model, specifically a LoRA-merged version of the gPRM-14B architecture, optimized for vLLM inference. Developed by Dong Bok Lee and his collaborators, this model functions as a reward model, as detailed in their paper "Rethinking Reward Models for Multi-Domain Test-Time Scaling" (arXiv:2510.00492). Its core capability lies in evaluating the correctness of individual steps within a proposed solution to a given problem.

Key Capabilities

Solution Step Verification: Critiques and verifies each step in a provided multi-step solution.
Reward Calculation: Computes a reward score based on the likelihood of "Yes" or "No" responses to the solution's correctness.
Multi-Domain Application: Designed for test-time scaling across various domains, suggesting adaptability to different problem types.

Good For

Automated Solution Assessment: Ideal for systems requiring automated evaluation of problem-solving steps.
Reinforcement Learning from Human Feedback (RLHF) Pipelines: Can be integrated as a reward model component in RLHF setups.
Educational Tools: Potentially useful in educational platforms to provide feedback on student-generated solutions.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)