rubricreward/R3-Qwen3-14B-14k

Warm
Public
14B
FP8
32768
May 14, 2025
License: apache-2.0
Hugging Face
Overview

R3-Qwen3-14B-14k: A Robust Rubric-Agnostic Reward Model

R3-Qwen3-14B-14k is a 14 billion parameter reward model developed by rubricreward, part of their R3 (Robust Rubric-Agnostic Reward Models) family. It is fine-tuned from the Qwen/Qwen3-14B architecture and specializes in evaluating responses based on provided rubrics and reasoning.

Key Capabilities

  • Rubric-Agnostic Evaluation: Designed to provide robust assessments across various tasks without being tied to a single rubric format.
  • Diverse Task Coverage: Trained on a curated R3 dataset encompassing 45 diverse sources, including classification, preference optimization, and question answering.
  • Detailed Assessment: Each training example includes an instruction, task description, input, response(s), evaluation rubrics, a score, and corresponding reasoning, enabling the model to generate fair and detailed assessments.
  • English Language Support: Primarily focused on English NLP tasks.

When to Use This Model

This model is ideal for applications requiring automated, detailed, and robust evaluation of generated text. It is particularly well-suited for:

  • Automated Content Scoring: Assigning scores and providing reasoning for responses in various NLP tasks.
  • Preference Optimization: Evaluating and ranking different responses based on specific criteria.
  • Quality Assurance: Assessing the quality of generated content against defined rubrics.

For more technical details, refer to the project page and the associated research paper.