OpenRubrics/RubricRM-4B-Rubric-v2
OpenRubrics/RubricRM-4B-Rubric-v2 is a 4 billion parameter Rubric Reward Model (v2) developed by OpenRubrics, fine-tuned from Qwen3/Qwen3-4B. This model is specifically designed to extract and generate rubric-style evaluation criteria from user requests, distinguishing between 'Hard Rules' and 'Principles'. It excels at creating comprehensive, concise, and distinct rubrics for assessing AI response quality, making it ideal for reward modeling and LLM alignment tasks.
Loading preview...
Model Overview
OpenRubrics/RubricRM-4B-Rubric-v2 is a 4 billion parameter model, fine-tuned from the Qwen3/Qwen3-4B architecture. Its primary function is to generate structured, rubric-style evaluation criteria from user prompts, which are then used to assess the quality of AI-generated responses. This model is a key component in reward modeling and LLM alignment efforts, aiming to provide scalable synthetic rubric generation.
Key Capabilities
- Rubric Extraction: Automatically identifies and extracts evaluation criteria from natural language requests.
- Categorization: Distinguishes between two types of rubrics:
- Hard Rules: Derived from explicit requirements (e.g., format, length, forbidden elements).
- Principles: Abstracted, domain-agnostic quality criteria (e.g., clarity, correctness, reasoning).
- Comprehensive & Concise: Ensures rubrics cover all critical aspects of a request while remaining distinct and free of redundancy.
- Structured Output: Generates rubrics in a specific numbered list format, starting with "The response" and appending
[Hard Rule]or[Principle].
Use Cases
This model is particularly well-suited for:
- Automated Evaluation: Generating criteria for automated assessment of LLM outputs.
- Reward Modeling: Creating synthetic rubrics to guide and improve the performance of large language models.
- LLM Alignment: Assisting in the alignment process by providing clear, structured feedback mechanisms.
For more technical details and to cite this work, please refer to the associated paper: OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment.