Name: songjhPKU/RxnCaption-VL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: songjhPKU

RxnCaption-VL: Chemical Reaction Diagram Parsing

RxnCaption-VL, developed by songjhPKU, is a specialized 7 billion parameter vision-language model built upon Qwen2.5-VL-7B-Instruct. Its core function is to parse chemical reaction diagrams, transforming visual information into structured JSON outputs.

Key Capabilities

Visual Prompt Guided Captioning: The model takes images annotated with Bounding-box Index Visual Prompts (BIVP), where bounding boxes and numeric labels highlight structures and text within the diagram.
Structured Output: It generates a JSON list for each reaction, detailing 'reactants', 'conditions', and 'products', with each element referencing either a structure index or extracted text.
Chemistry Expertise: Fine-tuned on the U-RxnDiagram-15k dataset (approximately 59,000 augmented samples), it demonstrates proficiency in interpreting complex chemical schematics.
Performance: Achieves a Hard F1 score of 75.5 and Soft F1 of 88.2 on the RxnScribe-test benchmark, and 55.5 (Hard F1) / 67.6 (Soft F1) on the U-RxnDiagram-15k-test.

Use Cases

RxnCaption-VL is ideal for automating the extraction of chemical reaction information from diagrams, supporting applications in chemical research, patent analysis, and digital chemistry databases. It provides a programmatic way to convert visual chemical data into machine-readable formats, streamlining data processing in chemistry-focused fields.

Overview

RxnCaption-VL: Chemical Reaction Diagram Parsing

Key Capabilities

Use Cases

Full Model Card (README)