prometheus-eval/prometheus-7b-v2.0

Cold
Public
7B
FP8
4096
License: apache-2.0
Hugging Face
Overview

Prometheus 2: A Specialized LLM Evaluator

Prometheus 2 is a 7 billion parameter language model, based on Mistral-Instruct, designed for the fine-grained evaluation of other Large Language Models (LLMs) and as a Reward Model for Reinforcement Learning from Human Feedback (RLHF). It offers a specialized alternative to general-purpose models like GPT-4 for evaluation tasks.

Key Capabilities & Features

  • Dual Grading Formats: Supports both absolute grading (direct assessment with a 1-5 score) and relative grading (pairwise ranking of two responses).
  • Weight Merging: Utilizes a novel weight merging technique to integrate capabilities for both absolute and relative grading, surprisingly improving performance on each format.
  • Specialized Training: Fine-tuned on a substantial dataset including 100K feedback entries from the Feedback Collection and 200K preference entries from the Preference Collection.
  • Prompt Format Guidance: Provides specific prompt templates for both absolute and relative grading, requiring components like instruction, response(s), reference answer, and score rubrics.

When to Use Prometheus 2

  • Evaluating LLM Outputs: Ideal for developers and researchers needing detailed, objective assessments of LLM responses.
  • RLHF Applications: Suitable for generating reward signals in Reinforcement Learning from Human Feedback pipelines.
  • Fine-grained Analysis: When a simple pass/fail or general score isn't enough, and detailed feedback based on specific criteria is required.

Prometheus 2 is an open-source model, with its research detailed in the paper "Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models" (arXiv:2405.01535).