Unbabel/M-Prometheus-7B

Warm
Public
7.6B
FP8
131072
License: other
Hugging Face
Overview

M-Prometheus-7B: A Multilingual LLM Judge

M-Prometheus-7B is a 7.6 billion parameter model developed by Unbabel, specifically engineered to function as an open LLM judge for evaluating multilingual outputs. Unlike general-purpose LLMs, its core strength lies in its ability to natively assess linguistic quality across multiple languages.

Key Capabilities

  • Multilingual Evaluation: Trained on a substantial dataset of 480,000 multilingual direct assessment and pairwise comparison instances, enabling native evaluation of diverse language pairs.
  • Detailed Feedback Generation: Provides long-form feedback based on specific score rubrics, assessing criteria such as accuracy, fluency, and style.
  • Direct Assessment Prompting: Utilizes a structured prompting approach, similar to Prometheus-2, for tasks like machine translation evaluation.
  • Specialized for MT Quality: The provided prompt example demonstrates its application in evaluating machine translation quality, offering scores from 1 to 5 based on predefined rubrics.

Good For

  • Automated Machine Translation (MT) Evaluation: Ideal for developers and researchers needing to automatically assess the quality of machine translation systems.
  • Linguistic Quality Assurance: Useful for tasks requiring detailed, rubric-based feedback on text quality in a multilingual context.
  • Research in LLM-based Evaluation: Provides a strong baseline for further research into using large language models as evaluators for complex linguistic tasks.