Name: THU-KEG/PairJudge-RM API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: THU-KEG

THU-KEG/PairJudge-RM: A Reward Model for Mathematical Reasoning

PairJudge RM is a 7.6 billion parameter reward model developed by THU-KEG, specifically designed to improve Best-of-N sampling for mathematical reasoning tasks. Unlike traditional reward models that assign absolute scores, PairJudge RM evaluates candidate solutions in pairs, determining which one is more correct through a transparent, step-by-step verification process.

Key Capabilities

Pairwise Judgment: Compares two candidate solutions simultaneously to identify the superior one.
Chain-of-Thought (CoT) Reasoning: Employs CoT to meticulously verify each step within the candidate solutions, providing clear and interpretable evaluations.
Enhanced Best-of-N Sampling: Facilitates a knockout tournament strategy to select the optimal solution from multiple candidates, particularly beneficial for complex mathematical problems.

Model Architecture and Training

PairJudge RM is built upon a pre-trained language model, specifically fine-tuned from Qwen-2.5-7B-Instruct. It was trained on the extensive PAIRJUDGE-432K dataset using the Adam optimizer with a learning rate of 1×10⁻⁵, a batch size of 128, over 8 epochs.

Good For

Evaluating mathematical problem-solving: Provides a robust method for assessing the correctness of different approaches to math problems.
Improving LLM outputs for reasoning tasks: Can be integrated into workflows to select higher-quality responses from language models.
Research in reward modeling: Offers a novel approach to reward signal generation based on comparative CoT reasoning.

For more technical details, refer to the PairJudge RM paper and the official code repository.

Overview

THU-KEG/PairJudge-RM: A Reward Model for Mathematical Reasoning

Key Capabilities

Model Architecture and Training

Good For

Full Model Card (README)