TIGER-Lab/TIGERScore-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Nov 26, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

TIGER-Lab's TIGERScore-7B is a 7 billion parameter LLaMA-2 based model designed for explainable, reference-free evaluation of text generation tasks. It is fine-tuned on the MetricInstruct dataset, covering 6 text generation tasks and 23 datasets. This model excels at providing detailed error analysis, including location, aspect, explanation, and penalty scores, making it a powerful tool for universal, interpretable text evaluation.

Loading preview...

TIGERScore-7B: Explainable, Reference-Free Text Generation Evaluation

TIGERScore-7B, developed by TIGER-Lab, is a 7 billion parameter model built on LLaMA-2, specifically designed to evaluate text generation tasks without needing a reference. It addresses common limitations of existing metrics, such as reliance on references, domain specificity, and lack of attribution, by providing detailed, instruction-guided error analysis.

Key Capabilities

  • Reference-Free Evaluation: Assesses generated text quality without requiring a ground-truth reference.
  • Explainable Error Analysis: Pinpoints errors with specific locations, aspects, explanations, and penalty scores.
  • Universal Applicability: Trained on the comprehensive MetricInstruct dataset, covering 6 diverse text generation tasks and 23 datasets, enabling broad use across various text generation scenarios.
  • High Correlation with Human Ratings: Achieves superior correlation with human judgments compared to many existing reference-based and reference-free metrics, as demonstrated in Kendall, Pearson, and Spearman evaluations.

Good For

  • Developers and researchers needing an automated, interpretable metric for evaluating LLM outputs.
  • Tasks requiring detailed feedback on generated text quality, beyond a single score.
  • Evaluating text generation models across summarization, translation, data-to-text, long-form QA, MathQA, instruction following, and story generation.
  • Situations where ground-truth references are unavailable or difficult to obtain.