Name: launch/ThinkPRM-14B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: launch

ThinkPRM-14B: Generative Process Reward Model

ThinkPRM-14B is a 14.8 billion parameter generative Process Reward Model (PRM) built upon the R1-Distill-Qwen-14B architecture. Its core function is to provide step-level verification scores and critiques for reasoning processes, such as mathematical solutions, by generating an explicit chain-of-thought (CoT) that labels each step as correct or incorrect.

Key Capabilities & Features

Step-by-Step Verification: Generates natural language critiques and correctness judgments for each step in a solution prefix.
Data Efficiency: Achieves strong performance with significantly less supervision data (1K synthetic examples) compared to traditional discriminative PRMs.
Interpretability: Uses a standard language modeling objective, making its verification process transparent and scalable.
Superior Performance: Outperforms LLM-as-a-judge and discriminative PRM baselines (trained on ~100x more labels) on benchmarks like ProcessBench, MATH-500, AIME '24, GPQA-Diamond, and LiveCodeBench.
High Context Length: Supports a context length of 131,072 tokens.

Ideal Use Cases

Solution Scoring: Assigning step-level or overall scores to candidate solutions for ranking in Best-of-N sampling or guiding tree search in reasoning tasks.
Verification Rationale Generation: Producing detailed chain-of-thought verifications that explain why a particular step is correct or incorrect, enhancing interpretability.
Standalone Evaluation: Directly evaluating the correctness of a given problem-solution pair in domains like mathematical reasoning, scientific QA, and code generation.

Overview

ThinkPRM-14B: Generative Process Reward Model

Key Capabilities & Features

Ideal Use Cases

Full Model Card (README)