OpenRubrics/RubricRM-4B-Judge-v2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 11, 2026Architecture:Transformer0.0K Warm

OpenRubrics/RubricRM-4B-Judge-v2 is a 4 billion parameter RubricRM-Judge model developed by OpenRubrics, fine-tuned from Qwen3/Qwen3-4B. This model is specifically designed for impartial evaluation of two responses (A and B) against a given instruction and a detailed rubric. It excels at structured, step-by-step judgment, including identifying gatekeeper criteria and providing concrete evidence for its decisions, making it ideal for automated quality assessment and reward modeling in LLM alignment.

Loading preview...

OpenRubrics/RubricRM-4B-Judge-v2 Overview

OpenRubrics/RubricRM-4B-Judge-v2 is a 4 billion parameter language model developed by OpenRubrics, built upon the Qwen3/Qwen3-4B architecture. It is specifically fine-tuned to act as an impartial judge for evaluating and comparing two distinct responses (Response A and Response B) based on a provided instruction and a detailed rubric.

Key Capabilities

  • Structured Evaluation: The model follows a multi-phase evaluation process, starting with a compliance check.
  • Gatekeeper Criterion Identification: It can identify the single most important, objective 'Gatekeeper Criterion' from a rubric, explaining its reasoning.
  • Detailed Response Analysis: For each response, the model evaluates against all rubric criteria, providing step-by-step reasoning and citing concrete evidence.
  • Impartial Final Judgment: It aggregates findings from the analysis phases to determine a winner (Response A or Response B) with a clear justification.
  • Strict Output Formatting: The model is designed to adhere to a precise output format, ensuring consistency and parseability of its judgments.

Good For

  • Automated LLM Evaluation: Ideal for programmatic assessment of LLM outputs against specific criteria.
  • Reward Modeling: Can be integrated into systems requiring structured feedback for reinforcement learning from human feedback (RLHF) or similar alignment techniques.
  • Quality Assurance: Useful for ensuring generated content meets predefined standards and constraints.

This model is intended to be used in conjunction with a RubricRM-Rubric generator to provide the necessary rubric input for its evaluation process. For more details, refer to the OpenRubrics paper.