Overview
M-Prometheus-7B: A Multilingual LLM Judge
M-Prometheus-7B is a 7.6 billion parameter model developed by Unbabel, specifically engineered to function as an open LLM judge for evaluating multilingual outputs. Unlike general-purpose LLMs, its core strength lies in its ability to natively assess linguistic quality across multiple languages.
Key Capabilities
- Multilingual Evaluation: Trained on a substantial dataset of 480,000 multilingual direct assessment and pairwise comparison instances, enabling native evaluation of diverse language pairs.
- Detailed Feedback Generation: Provides long-form feedback based on specific score rubrics, assessing criteria such as accuracy, fluency, and style.
- Direct Assessment Prompting: Utilizes a structured prompting approach, similar to Prometheus-2, for tasks like machine translation evaluation.
- Specialized for MT Quality: The provided prompt example demonstrates its application in evaluating machine translation quality, offering scores from 1 to 5 based on predefined rubrics.
Good For
- Automated Machine Translation (MT) Evaluation: Ideal for developers and researchers needing to automatically assess the quality of machine translation systems.
- Linguistic Quality Assurance: Useful for tasks requiring detailed, rubric-based feedback on text quality in a multilingual context.
- Research in LLM-based Evaluation: Provides a strong baseline for further research into using large language models as evaluators for complex linguistic tasks.