TIGER-Lab/TIGERScore-13B

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Nov 26, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

TIGER-Lab/TIGERScore-13B is a 13 billion parameter LLaMA-2 based model developed by TIGER-Lab, fine-tuned as an instruction-guided, reference-free metric for evaluating text generation tasks. It provides explainable error analysis, pinpointing mistakes with location, aspect, explanation, and penalty scores. This model excels at correlating with human ratings across various text generation tasks, surpassing many existing reference-based and reference-free metrics.

Loading preview...

TIGERScore-13B: An Explainable, Reference-Free Text Generation Metric

TIGERScore-13B, developed by TIGER-Lab, is a 13 billion parameter model based on LLaMA-2, specifically designed as a trained metric for evaluating text generation. Unlike traditional metrics that often rely on reference texts or are limited to specific domains, TIGERScore operates reference-free and provides explainable error analysis.

Key Capabilities

  • Instruction-Guided Evaluation: Evaluates text generation based on natural language instructions.
  • Detailed Error Analysis: Pinpoints errors in generated text by identifying location, aspect, explanation, and assigning penalty scores.
  • Reference-Free: Assesses quality without needing a ground-truth reference output.
  • Broad Task Coverage: Trained on the MetricInstruct dataset, covering 6 text generation tasks (e.g., Summarization, Translation, Data2Text, Long-form QA, MathQA, Instruction Following) and 23 datasets.
  • High Correlation with Human Ratings: Demonstrates superior correlation with human judgments compared to many existing metrics, both reference-based and reference-free, across various tasks.

Good For

  • Automated Evaluation of LLM Outputs: Ideal for developers and researchers needing to automatically assess the quality of text generated by large language models.
  • Debugging and Improving Text Generation: The detailed error explanations can help in understanding specific weaknesses in generation models and guide improvements.
  • Research in Text Evaluation: Offers a powerful, interpretable, and easy-to-use tool for advancing research in universal explainable metrics for text generation.