OpenRubrics/RubricARROW-8B-Judge

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 27, 2026Architecture:Transformer Warm

OpenRubrics/RubricARROW-8B-Judge is an 8 billion parameter causal language model, fine-tuned from Qwen3-8B, specifically designed for evaluating AI-generated content against predefined rubric items. This model excels at providing detailed explanations and boolean judgments on whether a conversation's last turn meets specific criteria, outputting its analysis in a structured JSON format. It is optimized for automated quality assessment and feedback generation in non-verifiable domains.

Loading preview...

OpenRubrics/RubricARROW-8B-Judge Overview

OpenRubrics/RubricARROW-8B-Judge is an 8 billion parameter model, fine-tuned from the Qwen3-8B architecture, specialized in automated rubric-based evaluation of AI-generated text. Its core function is to assess the quality of an assistant's response within a conversation against a given rubric item, providing a structured JSON output.

Key Capabilities

  • Rubric-based Evaluation: Judges how well the last turn of a conversation adheres to specific rubric criteria.
  • Detailed Explanations: Generates a string explanation for each rubric item, detailing why the criteria were or were not met.
  • Boolean Judgment: Provides a true/false boolean indicating whether the response fully meets the rubric item's criteria.
  • Structured Output: Returns evaluations in a precise JSON format, making it easy for programmatic consumption.
  • Customizable Prompting: Utilizes a JUDGE_PROMPT_TEMPLATE for clear instruction on evaluation tasks.

Good For

  • Automated Content Moderation: Assessing AI-generated responses for adherence to guidelines.
  • Quality Assurance: Evaluating the performance of other LLMs based on predefined standards.
  • Feedback Generation: Providing structured feedback on conversational AI outputs.
  • Research in Non-verifiable Domains: As part of the broader RUBRIC-ARROW framework for LLM post-training, particularly in areas where objective ground truth is difficult to establish.