Overview

Prometheus-13b-v1.0, developed by KAIST AI, is a 13 billion parameter language model built upon the Llama-2-Chat architecture. It has been extensively fine-tuned using 100,000 feedback examples from the Feedback Collection dataset. This specialized training enables Prometheus to excel in the fine-grained evaluation of long-form responses, a task where it demonstrates performance comparable to GPT-4 and superior to GPT-3.5-Turbo and Llama-2-Chat 70B.

Key Capabilities

Fine-grained LLM Evaluation: Prometheus is designed to evaluate other large language models based on customized criteria, utilizing a provided instruction, the response to evaluate, a reference answer (score 5), and a detailed score rubric.
Cost-Effective Alternative to GPT-4: It offers a powerful yet cheaper solution for LLM evaluation, allowing users to define specific criteria such as child readability, cultural sensitivity, or creativity.
Reward Model for RLHF: The model can be effectively used as a reward model in Reinforcement Learning from Human Feedback (RLHF frameworks).

Prompt Format

Prometheus requires a specific input format comprising an instruction, the response under evaluation, a reference answer, and a score rubric with detailed criteria descriptions for scores 1 through 5. This structured input ensures precise and context-aware evaluations. The model's output includes detailed feedback and an integer score between 1 and 5, separated by [RESULT].

Overview

Overview

Key Capabilities

Prompt Format

Full Model Card (README)