tifa-benchmark/llama2_tifa_question_generation

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Aug 16, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The tifa-benchmark/llama2_tifa_question_generation model is a 7 billion parameter LLaMA 2-based language model, fine-tuned for generating question-answer pairs from image descriptions. Developed as part of the TIFA (Text-to-Image Faithfulness evaluation) project, it parses text prompts into visual entities, attributes, and relations. This model's primary strength lies in its ability to automatically create structured question-answer tuples for evaluating the faithfulness of text-to-image generation models.

Loading preview...

Overview

This model, tifa-benchmark/llama2_tifa_question_generation, is a fine-tuned LLaMA 2 (7B parameters) language model designed for specialized text parsing and question generation. It was developed for the ICCV 2023 paper "TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering" and serves as a substitute for the GPT-3 model used in the original research.

Key Capabilities

  • Text-to-QA Generation: Given an image description, the model automatically generates multiple-choice question-answer pairs to verify the description's correctness.
  • Concept Classification: It classifies concepts within a description into types such as object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, or other.
  • Faithfulness Evaluation: The generated Q&A pairs are used to evaluate the faithfulness of text-to-image models by checking if VQA models can correctly answer these questions based on the generated image.
  • Structured Output: The model produces structured output, including entities, activities, colors, and specific questions about each element, along with choices and correct answers.

Good For

  • Evaluating Text-to-Image Models: Ideal for researchers and developers needing an automated, interpretable metric to assess how accurately generated images reflect their input text prompts.
  • Automated Question Generation: Useful for creating targeted questions from descriptive text, particularly in contexts related to visual content analysis.
  • Integration with TIFA Framework: Designed to work seamlessly with the tifascore package for comprehensive faithfulness evaluation, including parsing its output into a usable format.