M-Prometheus-3B: Multilingual LLM Judge
M-Prometheus-3B is a specialized 3.1 billion parameter language model developed by Unbabel, designed to function as an open LLM judge. Its core capability lies in the native evaluation of multilingual outputs, particularly for machine translation (MT) tasks.
Key Capabilities
- Multilingual Evaluation: Natively assesses text quality across multiple languages, trained on a substantial dataset of 480,000 multilingual direct assessment and pairwise comparison instances.
- Detailed Feedback Generation: Provides long-form feedback based on specific score rubrics, evaluating aspects like accuracy, fluency, and style.
- Scoring System: Assigns an integer score (1-5) based on the provided rubric, enabling quantitative assessment of translation quality.
- Prometheus-2 Compatibility: Can be prompted in the same manner as Prometheus-2 models, facilitating integration for users familiar with that framework.
Use Cases
This model is particularly well-suited for:
- Automated Machine Translation Evaluation: Ideal for developers and researchers needing to automatically assess the quality of MT systems.
- Quality Assurance: Can be integrated into workflows to provide consistent and objective feedback on translated content.
- Research in NLP: Useful for experiments and studies requiring a robust, open-source multilingual evaluation metric. More details are available in the associated paper.