Overview
This model, tifa-benchmark/llama2_tifa_question_generation, is a fine-tuned LLaMA 2 (7B parameters) language model designed for specialized text parsing and question generation. It was developed for the ICCV 2023 paper "TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering" and serves as a substitute for the GPT-3 model used in the original research.
Key Capabilities
- Text-to-QA Generation: Given an image description, the model automatically generates multiple-choice question-answer pairs to verify the description's correctness.
- Concept Classification: It classifies concepts within a description into types such as object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, or other.
- Faithfulness Evaluation: The generated Q&A pairs are used to evaluate the faithfulness of text-to-image models by checking if VQA models can correctly answer these questions based on the generated image.
- Structured Output: The model produces structured output, including entities, activities, colors, and specific questions about each element, along with choices and correct answers.
Good For
- Evaluating Text-to-Image Models: Ideal for researchers and developers needing an automated, interpretable metric to assess how accurately generated images reflect their input text prompts.
- Automated Question Generation: Useful for creating targeted questions from descriptive text, particularly in contexts related to visual content analysis.
- Integration with TIFA Framework: Designed to work seamlessly with the
tifascore package for comprehensive faithfulness evaluation, including parsing its output into a usable format.