OpenRubrics/RubricRM-4B-Judge-v2
OpenRubrics/RubricRM-4B-Judge-v2 is a 4 billion parameter RubricRM-Judge model developed by OpenRubrics, fine-tuned from Qwen3/Qwen3-4B. This model is specifically designed for impartial evaluation of two responses (A and B) against a given instruction and a detailed rubric. It excels at structured, step-by-step judgment, including identifying gatekeeper criteria and providing concrete evidence for its decisions, making it ideal for automated quality assessment and reward modeling in LLM alignment.
Loading preview...
OpenRubrics/RubricRM-4B-Judge-v2 Overview
OpenRubrics/RubricRM-4B-Judge-v2 is a 4 billion parameter language model developed by OpenRubrics, built upon the Qwen3/Qwen3-4B architecture. It is specifically fine-tuned to act as an impartial judge for evaluating and comparing two distinct responses (Response A and Response B) based on a provided instruction and a detailed rubric.
Key Capabilities
- Structured Evaluation: The model follows a multi-phase evaluation process, starting with a compliance check.
- Gatekeeper Criterion Identification: It can identify the single most important, objective 'Gatekeeper Criterion' from a rubric, explaining its reasoning.
- Detailed Response Analysis: For each response, the model evaluates against all rubric criteria, providing step-by-step reasoning and citing concrete evidence.
- Impartial Final Judgment: It aggregates findings from the analysis phases to determine a winner (Response A or Response B) with a clear justification.
- Strict Output Formatting: The model is designed to adhere to a precise output format, ensuring consistency and parseability of its judgments.
Good For
- Automated LLM Evaluation: Ideal for programmatic assessment of LLM outputs against specific criteria.
- Reward Modeling: Can be integrated into systems requiring structured feedback for reinforcement learning from human feedback (RLHF) or similar alignment techniques.
- Quality Assurance: Useful for ensuring generated content meets predefined standards and constraints.
This model is intended to be used in conjunction with a RubricRM-Rubric generator to provide the necessary rubric input for its evaluation process. For more details, refer to the OpenRubrics paper.