Unbabel/M-Prometheus-14B is a 14.8 billion parameter open LLM judge developed by Unbabel, designed for natively evaluating multilingual outputs. This model was trained on 480,000 instances of multilingual direct assessment and pairwise comparison data, including long-form feedback. It specializes in providing detailed feedback and scoring for tasks like machine translation evaluation, leveraging a comprehensive rubric. M-Prometheus-14B is particularly suited for assessing the quality of multilingual text generation with high accuracy and detailed reasoning.
Overview
Unbabel/M-Prometheus-14B is a 14.8 billion parameter large language model specifically developed as an open LLM judge. Its primary function is to natively evaluate multilingual outputs, distinguishing it from general-purpose LLMs. The model was trained on an extensive dataset comprising 480,000 instances of multilingual direct assessment and pairwise comparison data, which included detailed long-form feedback.
Key Capabilities
- Multilingual Evaluation: Designed to assess text quality across multiple languages.
- Detailed Feedback Generation: Provides comprehensive feedback based on specific score rubrics, similar to Prometheus-2.
- Scoring System: Assigns an integer score (1-5) based on predefined criteria like Accuracy, Fluency, and Style.
- Machine Translation (MT) Evaluation: Optimized for direct-assessment MT evaluation, using a structured prompt format to guide its assessment.
Good For
- Automated quality assessment of multilingual text generation.
- Evaluating machine translation outputs against reference answers and detailed rubrics.
- Researchers and developers needing an open-source, specialized LLM for judging text quality in diverse languages.