launch/ThinkPRM-14B
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 25, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

ThinkPRM-14B is a 14.8 billion parameter generative Process Reward Model (PRM) developed by launch, based on the R1-Distill-Qwen-14B architecture. It is fine-tuned to perform step-by-step verification of reasoning processes by generating explicit verification chain-of-thought (CoT) with step-level labeling. This model is highly data-efficient, requiring significantly less supervision data than traditional discriminative PRMs while achieving strong performance. It excels at scoring solutions, generating detailed verification rationales, and evaluating problem-solution pairs across mathematical reasoning, scientific QA, and code generation tasks.

Loading preview...