allenai/truthfulqa-info-judge-llama2-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 7, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The allenai/truthfulqa-info-judge-llama2-7B is a 7 billion parameter LLaMa2-based model developed by AllenAI, specifically fine-tuned to serve as an informativeness judge for TruthfulQA evaluations. This model replaces the deprecated OpenAI Curie engine previously used for assessing informativeness in language model responses. It is designed for accessible and reproducible evaluation of new models on fixed TruthfulQA prompts, offering a specialized tool for research in truthfulness and informativeness.

Loading preview...

Overview

This model, allenai/truthfulqa-info-judge-llama2-7B, is a specialized 7 billion parameter LLaMa2-based model developed by AllenAI. Its primary function is to act as an informativeness judge within the TruthfulQA evaluation framework. It was created to replace the original OpenAI Curie engine, which is no longer available, thereby making TruthfulQA evaluations more accessible and reproducible for researchers.

Key Capabilities

  • Informativeness Evaluation: Specifically trained to assess the informativeness of language model responses in the context of TruthfulQA prompts.
  • Reproducible Research: Enables consistent and open-source evaluation of model informativeness, overcoming the limitations of proprietary APIs.
  • LLaMa2-7B Base: Built upon the LLaMa2 7B architecture, providing a robust foundation for its specialized task.

Intended Use

This model is exclusively intended for TruthfulQA evaluation. While it is designed to generalize to the evaluation of new models on the fixed set of prompts within TruthfulQA, its generalization to entirely new or unseen prompts may be limited. Developers can integrate it into their evaluation pipelines using the provided Python script example for assessing informativeness.