Name: PeiyangLiu/CoE-Wiki-CoE-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PeiyangLiu

CoE-Wiki-CoE-8B: Vision-Language Model for Chain-of-Evidence QA

CoE-Wiki-CoE-8B is an 8 billion parameter vision-language model developed by PeiyangLiu, specifically fine-tuned for Chain-of-Evidence (CoE) question answering. This model's primary function is to process a natural-language question alongside candidate screenshot images and generate a structured answer that includes an explicit evidence chain.

Key Capabilities

Multimodal Question Answering: Integrates natural language questions with visual information from screenshots.
Evidence Selection: Identifies and localizes supporting evidence within candidate screenshots.
Structured Output: Produces a JSON-style response containing the evidence_chain (selected screenshots and localized evidence) and the answer.
Research Focus: Intended for research in multimodal QA, visual evidence selection, and evidence-grounded reasoning over document-like documents.

Training and Usage

The model was fine-tuned on the Wiki-CoE dataset. Developers can utilize the transformers library for inference, loading the model and processor with AutoProcessor and AutoModelForImageTextToText. For reproducible results, it's recommended to use the same image preprocessing and prompt format as detailed in the CoE repository.

Related Resources

Paper: https://arxiv.org/abs/2605.01284
Dataset: https://huggingface.co/datasets/PeiyangLiu/wiki-coe

Overview

CoE-Wiki-CoE-8B: Vision-Language Model for Chain-of-Evidence QA

Key Capabilities

Training and Usage

Related Resources

Full Model Card (README)