Overview
Prometheus-7b-v1.0 is a 7 billion parameter language model developed by KAIST AI, built upon the Llama-2-Chat architecture. It has been extensively fine-tuned using 100,000 feedback examples from the dedicated Feedback Collection dataset. This specialization enables Prometheus to excel in the nuanced evaluation of long-form responses generated by other large language models.
Key Capabilities
- Fine-grained LLM Evaluation: Prometheus is specifically designed to provide detailed, criterion-based evaluations of LLM outputs, leveraging reference answers and customized score rubrics.
- Cost-Effective Alternative to GPT-4: It offers a powerful and more economical solution for evaluation tasks that typically require models like GPT-4, matching its performance on various benchmarks.
- Customizable Criteria: Users can define their own evaluation criteria (e.g., child readability, cultural sensitivity, creativity) through detailed score rubrics.
- RLHF Reward Model: The model can be effectively utilized as a reward model within Reinforcement Learning from Human Feedback (RLHF) pipelines.
- Performance: Outperforms GPT-3.5-Turbo and Llama-2-Chat 70B in evaluation tasks, achieving performance comparable to GPT-4.
When to Use This Model
- Evaluating LLM Responses: Ideal for developers and researchers needing objective, detailed feedback on the quality of LLM-generated text, especially long-form content.
- Custom Evaluation Metrics: When standard evaluation metrics are insufficient, and specific, custom criteria are required.
- RLHF Applications: Suitable for integration into RLHF systems as a reward model to guide model training based on human-like feedback.
- Resource-Constrained Evaluation: A strong choice for high-quality evaluation when the cost or accessibility of larger models like GPT-4 is a concern.