rubricreward/mR3-Qwen3-8B-en-prompt-en-thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 19, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

rubricreward/mR3-Qwen3-8B-en-prompt-en-thinking is an 8 billion parameter reward model, part of the mR3 (Multilingual Rubric-Agnostic Reward Reasoning Models) family, fine-tuned from Qwen3-8B. It is specifically designed for evaluating responses based on detailed rubrics and reasoning, trained on a curated dataset covering 72 languages for tasks like classification, preference optimization, and question answering. This model excels at providing scores and explanations for assistant responses, making it ideal for automated content evaluation and quality control in multilingual contexts.

Loading preview...

mR3-Qwen3-8B-en-prompt-en-thinking Overview

mR3-Qwen3-8B-en-prompt-en-thinking is an 8 billion parameter reward model, part of the Multilingual Rubric-Agnostic Reward Reasoning Models (mR3) family. It is fine-tuned from the Qwen3-8B base model and specializes in evaluating assistant responses against detailed rubrics, providing both a score and corresponding reasoning. The model has a context length of 32768 tokens.

Key Capabilities

  • Multilingual Evaluation: Trained on a diverse dataset covering 72 languages, enabling robust evaluation across a wide linguistic spectrum.
  • Rubric-Agnostic Reasoning: Designed to provide reasoning and scores based on various evaluation rubrics, including factors like safety, helpfulness, relevance, conciseness, politeness, and coverage.
  • Task Versatility: Applicable to a range of tasks such as classification, preference optimization, and question answering.
  • Detailed Feedback: Generates an explanation comparing responses and a clear verdict (e.g., 'Assistant A' or 'Assistant B').

Good For

  • Automated Content Moderation: Evaluating the safety and appropriateness of generated text.
  • Response Quality Control: Assessing the helpfulness, relevance, and overall quality of AI assistant outputs.
  • Preference Optimization: Providing structured feedback for training and refining language models.
  • Multilingual Applications: Evaluating responses in a broad array of languages, leveraging its 72-language training data.

For more technical details, refer to the mR3 paper.