prometheus-eval/prometheus-7b-v2.0

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 13, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

Prometheus 2 (prometheus-eval/prometheus-7b-v2.0) is a 7 billion parameter language model developed by prometheus-eval, built upon the Mistral-Instruct base architecture with a 4096 token context length. It is specifically fine-tuned on 100K feedback and 200K preference data for fine-grained evaluation of other LLMs and functions as a reward model for RLHF. This model uniquely supports both absolute grading (direct assessment) and relative grading (pairwise ranking) through weight merging, making it an alternative to GPT-4 for detailed LLM evaluation.

Loading preview...

Prometheus 2: A Specialized LLM Evaluator

Prometheus 2 is a 7 billion parameter language model, based on Mistral-Instruct, designed for the fine-grained evaluation of other Large Language Models (LLMs) and as a Reward Model for Reinforcement Learning from Human Feedback (RLHF). It offers a specialized alternative to general-purpose models like GPT-4 for evaluation tasks.

Key Capabilities & Features

  • Dual Grading Formats: Supports both absolute grading (direct assessment with a 1-5 score) and relative grading (pairwise ranking of two responses).
  • Weight Merging: Utilizes a novel weight merging technique to integrate capabilities for both absolute and relative grading, surprisingly improving performance on each format.
  • Specialized Training: Fine-tuned on a substantial dataset including 100K feedback entries from the Feedback Collection and 200K preference entries from the Preference Collection.
  • Prompt Format Guidance: Provides specific prompt templates for both absolute and relative grading, requiring components like instruction, response(s), reference answer, and score rubrics.

When to Use Prometheus 2

  • Evaluating LLM Outputs: Ideal for developers and researchers needing detailed, objective assessments of LLM responses.
  • RLHF Applications: Suitable for generating reward signals in Reinforcement Learning from Human Feedback pipelines.
  • Fine-grained Analysis: When a simple pass/fail or general score isn't enough, and detailed feedback based on specific criteria is required.

Prometheus 2 is an open-source model, with its research detailed in the paper "Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models" (arXiv:2405.01535).

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p