OpenRubrics/RubricRM-8B-Judge-v2

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 11, 2026Architecture:Transformer Cold

OpenRubrics/RubricRM-8B-Judge-v2 is an 8 billion parameter RubricRM-Judge model, fine-tuned from Qwen3-8B. This model is specifically designed for evaluating and judging responses based on a provided rubric, excelling at impartial, step-by-step assessment of 'Response A' and 'Response B' against defined criteria. It focuses on structured evaluation, including compliance checks and detailed criterion-by-criterion analysis, making it suitable for automated content moderation and quality assurance tasks.

Loading preview...

OpenRubrics/RubricRM-8B-Judge-v2 Overview

OpenRubrics/RubricRM-8B-Judge-v2 is an 8 billion parameter language model, fine-tuned from the Qwen3-8B architecture. Its primary function is to act as a fair and impartial judge for evaluating two responses ('Response A' and 'Response B') against a given instruction and a detailed rubric. This model is engineered to perform a structured, multi-phase evaluation process.

Key Capabilities

  • Compliance Check: Identifies the single most important, objective 'Gatekeeper Criterion' from the rubric and provides reasoning.
  • Detailed Analysis: Evaluates each response item-by-item against all rubric criteria, citing concrete evidence for judgments.
  • Final Judgment: Determines a winner based on aggregated findings from the compliance check and detailed analysis, providing a clear justification.
  • Structured Output: Adheres to a strict output format, ensuring consistent and parseable evaluation results.

Use Cases

This model is particularly well-suited for applications requiring automated, objective content evaluation and comparison. Potential use cases include:

  • Automated Grading: Assessing student responses or creative writing based on predefined rubrics.
  • Content Moderation: Evaluating user-generated content against policy guidelines.
  • Quality Assurance: Comparing different AI model outputs or human-generated content for adherence to specific standards.
  • Reward Modeling: Providing structured feedback for training other language models, as suggested by its origin in the OpenRubrics project for LLM alignment.