songjhPKU/RxnCaption-VL

VISIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:32kPublished:Mar 19, 2026License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

RxnCaption-VL by songjhPKU is a 7 billion parameter vision-language model fine-tuned from Qwen2.5-VL-7B-Instruct, specifically designed for chemical reaction diagram parsing. It processes images annotated with bounding-box indices (BIVP) to output structured JSON descriptions of chemical reactions. This model excels at extracting reactants, conditions, and products from complex chemical diagrams, making it highly specialized for chemistry-related visual data interpretation.

Loading preview...

RxnCaption-VL: Chemical Reaction Diagram Parsing

RxnCaption-VL, developed by songjhPKU, is a specialized 7 billion parameter vision-language model built upon Qwen2.5-VL-7B-Instruct. Its core function is to parse chemical reaction diagrams, transforming visual information into structured JSON outputs.

Key Capabilities

  • Visual Prompt Guided Captioning: The model takes images annotated with Bounding-box Index Visual Prompts (BIVP), where bounding boxes and numeric labels highlight structures and text within the diagram.
  • Structured Output: It generates a JSON list for each reaction, detailing 'reactants', 'conditions', and 'products', with each element referencing either a structure index or extracted text.
  • Chemistry Expertise: Fine-tuned on the U-RxnDiagram-15k dataset (approximately 59,000 augmented samples), it demonstrates proficiency in interpreting complex chemical schematics.
  • Performance: Achieves a Hard F1 score of 75.5 and Soft F1 of 88.2 on the RxnScribe-test benchmark, and 55.5 (Hard F1) / 67.6 (Soft F1) on the U-RxnDiagram-15k-test.

Use Cases

RxnCaption-VL is ideal for automating the extraction of chemical reaction information from diagrams, supporting applications in chemical research, patent analysis, and digital chemistry databases. It provides a programmatic way to convert visual chemical data into machine-readable formats, streamlining data processing in chemistry-focused fields.