DangIT02/qwen3vl-flowchart-to-mermaid_v2

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

DangIT02/qwen3vl-flowchart-to-mermaid_v2 is an 8 billion parameter vision-language model, based on unsloth/Qwen3-VL-8B-Instruct, specifically fine-tuned to convert flowchart images into Mermaid code. This model excels at transcribing diagram nodes, edges, labels, and direction, producing valid Mermaid syntax with canonicalized node IDs. It is optimized for medium-complexity flowcharts (10-20 nodes) and achieves 100% parse success for generated Mermaid code.

Loading preview...

Qwen3-VL-8B Flowchart-to-Mermaid (v2)

This model, developed by DangIT02, is an 8 billion parameter vision-language model fine-tuned from unsloth/Qwen3-VL-8B-Instruct. Its primary function is to transcribe flowchart diagrams from images into valid Mermaid flowchart code, preserving nodes, edges, labels, and direction.

Key Capabilities & Features

  • Flowchart-to-Mermaid Conversion: Converts visual flowcharts into Mermaid syntax.
  • Canonicalized Node IDs: Outputs Mermaid code with standardized node IDs (A, B, C, etc.) for deterministic and tool-compatible results.
  • High Parse Success: Achieves 100% parse success, ensuring all generated Mermaid code is syntactically valid.
  • Optimized for Medium Complexity: Performs best on flowcharts with 10-20 nodes, achieving a node F1 score of 0.796.
  • Efficient Training: Fine-tuned using LoRA (rank 16, alpha 16) on the DangIT02/flowchart-to-mermaid_v2 dataset, with only 0.50% of parameters trainable.

Limitations & Considerations

  • Hallucination on Small Diagrams: Tends to over-generate nodes for flowcharts with fewer than 10 nodes.
  • Direction Detection: Direction accuracy can degrade on very large diagrams (20+ nodes), sometimes defaulting to graph TD.
  • Canonical IDs Only: Does not reproduce descriptive node IDs from the original diagram; post-processing is required for semantic IDs.
  • English Labels Only: Performance with non-English labels is untested.
  • Single Image Input: Processes one flowchart image per prompt.
  • Output Length: Maximum output is approximately 2048 tokens, which may truncate very large flowcharts.