DangIT02/qwen3vl-flowchart-to-mermaid_v3

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

DangIT02/qwen3vl-flowchart-to-mermaid_v3 is an 8 billion parameter vision-language model, fine-tuned from unsloth/Qwen3-VL-8B-Instruct, specifically designed to transcribe flowchart images into Mermaid code. This v3 iteration significantly improves transcription fidelity by fine-tuning the vision tower, achieving an 82.5% node_f1 score. It excels at converting visual flowcharts into structured, reproducible Mermaid syntax, making it ideal for automating diagram documentation and generation.

Loading preview...

Overview

DangIT02/qwen3vl-flowchart-to-mermaid_v3 is an 8 billion parameter vision-language model, built upon unsloth/Qwen3-VL-8B-Instruct, specialized in converting flowchart images into Mermaid code. This model, developed by DangIT02, focuses on accurately reproducing diagram nodes, edges, labels, and direction in Mermaid syntax.

Key Capabilities & Improvements

  • Flowchart-to-Mermaid Transcription: Converts visual flowchart diagrams into valid Mermaid code.
  • Enhanced Fidelity (v3): Unlike its predecessor, v3 fine-tunes the vision tower, leading to substantial improvements in transcription accuracy, particularly for smaller diagrams.
  • Canonicalized Output: Generates Mermaid code with canonicalized node IDs (A, B, C, etc.) for deterministic output and tool-use compatibility, while preserving original node and edge labels.
  • Performance: Achieves an overall node_f1 of 0.825, with significant gains on small diagrams (node_f1 of 0.872, a +0.366 improvement over v2).
  • High Parse Success: Boasts a 100% parse success rate for generated Mermaid code.

Limitations

  • Direction Detection: Accuracy for diagram direction (graph TD, BT, LR) degrades on very large flowcharts (20+ nodes).
  • Hallucination: May occasionally generate plausible-but-incorrect structures for extremely complex diagrams (25+ nodes, dense text).
  • English-only Labels: Performance with non-English labels is untested.
  • Max Output: Output is limited to approximately 2048 tokens, which may truncate very large flowcharts.