dongboklee/gPRM-14B-merged

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Sep 29, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

dongboklee/gPRM-14B-merged is a 14.8 billion parameter LoRA-merged language model developed by Dong Bok Lee et al. for vLLM inference. It is based on the gPRM-14B architecture and is specifically designed as a reward model for multi-domain test-time scaling. This model's primary application is to review and critique step-by-step solutions to problems, determining the correctness of each step.

Loading preview...

Overview

dongboklee/gPRM-14B-merged is a 14.8 billion parameter model, specifically a LoRA-merged version of the gPRM-14B architecture, optimized for vLLM inference. Developed by Dong Bok Lee and his collaborators, this model functions as a reward model, as detailed in their paper "Rethinking Reward Models for Multi-Domain Test-Time Scaling" (arXiv:2510.00492). Its core capability lies in evaluating the correctness of individual steps within a proposed solution to a given problem.

Key Capabilities

  • Solution Step Verification: Critiques and verifies each step in a provided multi-step solution.
  • Reward Calculation: Computes a reward score based on the likelihood of "Yes" or "No" responses to the solution's correctness.
  • Multi-Domain Application: Designed for test-time scaling across various domains, suggesting adaptability to different problem types.

Good For

  • Automated Solution Assessment: Ideal for systems requiring automated evaluation of problem-solving steps.
  • Reinforcement Learning from Human Feedback (RLHF) Pipelines: Can be integrated as a reward model component in RLHF setups.
  • Educational Tools: Potentially useful in educational platforms to provide feedback on student-generated solutions.