gaotang/RM-R1-Qwen2.5-Instruct-14B
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:May 6, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

The gaotang/RM-R1-Qwen2.5-Instruct-14B is a 14.8 billion parameter Reasoning Reward Model (ReasRM) based on the Qwen-2.5-Instruct architecture, developed by Gaotang and others. This model is trained using a two-stage process involving distillation of reasoning traces and Reinforcement Learning with Verifiable Rewards (RLVR). It is specifically designed to judge two candidate answers by first generating structured rubrics or reasoning traces, then emitting a preference, offering interpretable justifications for its evaluations.

Loading preview...