gaotang/RM-R1-Qwen2.5-Instruct-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 6, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

The gaotang/RM-R1-Qwen2.5-Instruct-7B is a 7.6 billion parameter instruction-tuned variant of the Qwen-2.5 model, developed by gaotang as part of the RM-R1 framework. This model is specifically designed as a Reasoning Reward Model (ReasRM) that evaluates candidate answers by first generating structured rubrics or reasoning traces, then emitting a preference. It excels at providing interpretable justifications for its judgments, making it suitable for tasks requiring transparent evaluation and policy optimization in RLHF/RLAIF.

Loading preview...