TIGER-Lab/TIGERScore-7B
TIGER-Lab's TIGERScore-7B is a 7 billion parameter LLaMA-2 based model designed for explainable, reference-free evaluation of text generation tasks. It is fine-tuned on the MetricInstruct dataset, covering 6 text generation tasks and 23 datasets. This model excels at providing detailed error analysis, including location, aspect, explanation, and penalty scores, making it a powerful tool for universal, interpretable text evaluation.
Loading preview...
TIGERScore-7B: Explainable, Reference-Free Text Generation Evaluation
TIGERScore-7B, developed by TIGER-Lab, is a 7 billion parameter model built on LLaMA-2, specifically designed to evaluate text generation tasks without needing a reference. It addresses common limitations of existing metrics, such as reliance on references, domain specificity, and lack of attribution, by providing detailed, instruction-guided error analysis.
Key Capabilities
- Reference-Free Evaluation: Assesses generated text quality without requiring a ground-truth reference.
- Explainable Error Analysis: Pinpoints errors with specific locations, aspects, explanations, and penalty scores.
- Universal Applicability: Trained on the comprehensive MetricInstruct dataset, covering 6 diverse text generation tasks and 23 datasets, enabling broad use across various text generation scenarios.
- High Correlation with Human Ratings: Achieves superior correlation with human judgments compared to many existing reference-based and reference-free metrics, as demonstrated in Kendall, Pearson, and Spearman evaluations.
Good For
- Developers and researchers needing an automated, interpretable metric for evaluating LLM outputs.
- Tasks requiring detailed feedback on generated text quality, beyond a single score.
- Evaluating text generation models across summarization, translation, data-to-text, long-form QA, MathQA, instruction following, and story generation.
- Situations where ground-truth references are unavailable or difficult to obtain.