Unbabel/M-Prometheus-7B
Unbabel's M-Prometheus-7B is a 7.6 billion parameter language model designed as an open LLM judge for native multilingual output evaluation. Trained on 480k instances of multilingual direct assessment and pairwise comparison data with long-form feedback, it excels at assessing machine translation quality across various languages. This model is specifically optimized for evaluating linguistic nuances and providing detailed feedback on translation accuracy, fluency, and style.
Loading preview...
M-Prometheus-7B: A Multilingual LLM Judge
M-Prometheus-7B is a 7.6 billion parameter model developed by Unbabel, specifically engineered to function as an open LLM judge for evaluating multilingual outputs. Unlike general-purpose LLMs, its core strength lies in its ability to natively assess linguistic quality across multiple languages.
Key Capabilities
- Multilingual Evaluation: Trained on a substantial dataset of 480,000 multilingual direct assessment and pairwise comparison instances, enabling native evaluation of diverse language pairs.
- Detailed Feedback Generation: Provides long-form feedback based on specific score rubrics, assessing criteria such as accuracy, fluency, and style.
- Direct Assessment Prompting: Utilizes a structured prompting approach, similar to Prometheus-2, for tasks like machine translation evaluation.
- Specialized for MT Quality: The provided prompt example demonstrates its application in evaluating machine translation quality, offering scores from 1 to 5 based on predefined rubrics.
Good For
- Automated Machine Translation (MT) Evaluation: Ideal for developers and researchers needing to automatically assess the quality of machine translation systems.
- Linguistic Quality Assurance: Useful for tasks requiring detailed, rubric-based feedback on text quality in a multilingual context.
- Research in LLM-based Evaluation: Provides a strong baseline for further research into using large language models as evaluators for complex linguistic tasks.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.