OpenRubrics/RubricARROW-8B-Judge
OpenRubrics/RubricARROW-8B-Judge is an 8 billion parameter causal language model, fine-tuned from Qwen3-8B, specifically designed for evaluating AI-generated content against predefined rubric items. This model excels at providing detailed explanations and boolean judgments on whether a conversation's last turn meets specific criteria, outputting its analysis in a structured JSON format. It is optimized for automated quality assessment and feedback generation in non-verifiable domains.
Loading preview...
OpenRubrics/RubricARROW-8B-Judge Overview
OpenRubrics/RubricARROW-8B-Judge is an 8 billion parameter model, fine-tuned from the Qwen3-8B architecture, specialized in automated rubric-based evaluation of AI-generated text. Its core function is to assess the quality of an assistant's response within a conversation against a given rubric item, providing a structured JSON output.
Key Capabilities
- Rubric-based Evaluation: Judges how well the last turn of a conversation adheres to specific rubric criteria.
- Detailed Explanations: Generates a string explanation for each rubric item, detailing why the criteria were or were not met.
- Boolean Judgment: Provides a
true/falseboolean indicating whether the response fully meets the rubric item's criteria. - Structured Output: Returns evaluations in a precise JSON format, making it easy for programmatic consumption.
- Customizable Prompting: Utilizes a
JUDGE_PROMPT_TEMPLATEfor clear instruction on evaluation tasks.
Good For
- Automated Content Moderation: Assessing AI-generated responses for adherence to guidelines.
- Quality Assurance: Evaluating the performance of other LLMs based on predefined standards.
- Feedback Generation: Providing structured feedback on conversational AI outputs.
- Research in Non-verifiable Domains: As part of the broader RUBRIC-ARROW framework for LLM post-training, particularly in areas where objective ground truth is difficult to establish.