Name: OpenRubrics/RubricARROW-8B-Judge API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: OpenRubrics

OpenRubrics/RubricARROW-8B-Judge Overview

This model is an 8 billion parameter judge model, fine-tuned from the Qwen/Qwen3-8B architecture. It is developed by OpenRubrics as part of the RUBRIC-ARROW framework, which focuses on alternating pointwise rubric reward modeling for LLM post-training, particularly in domains where verification is challenging.

Key Capabilities

Automated LLM Response Evaluation: Designed to assess the quality of LLM-generated responses against specific rubric items.
Detailed Explanations: Provides a string explanation for why a response does or does not meet a given criterion.
Boolean Criteria Assessment: Outputs a true/false boolean indicating whether each rubric item's criteria are fully met.
JSON Output Format: Structures its evaluation output as a single JSON object, making it easy for programmatic parsing.
Rubric-Driven Scoring: Integrates with a probability-based scoring mechanism that can assign weights to different rubric tags (e.g., "hard rule," "principle").

Good For

LLM Post-training and Fine-tuning: Ideal for generating reward signals to improve LLM performance in non-verifiable domains.
Automated Quality Assurance: Can be used to automatically check if LLM outputs adhere to specific guidelines or requirements.
Developer Tooling: Provides a structured way to evaluate and debug LLM responses based on explicit criteria.

This model is particularly useful for scenarios where human evaluation is costly or time-consuming, offering a scalable solution for assessing LLM output quality based on predefined rubrics.

Overview

OpenRubrics/RubricARROW-8B-Judge Overview

Key Capabilities

Good For

Full Model Card (README)