Wigtn/Qwen3-VL-2B-WigtnOCR
Wigtn/Qwen3-VL-2B-WigtnOCR is a 2 billion parameter vision-language model (VLM) developed by WIGTN Crew, distilled from a 30B teacher model. Optimized for document parsing, it excels at extracting structured information from complex layouts, particularly Korean government documents, and achieves top retrieval performance in RAG pipelines. This model is designed for production-ready, fast inference on a single GPU, offering enhanced table and text extraction over its larger teacher.
Loading preview...
Overview
WigtnOCR-2B is a 2 billion parameter Vision-Language Model (VLM) developed by WIGTN Crew, specifically designed for robust document parsing. It is distilled from a 30B parameter teacher model, Qwen3-VL-30B, using a pseudo-label distillation method that leverages quality-filtered ground truth. This approach allows WigtnOCR-2B to match or even surpass its larger teacher in document parsing quality across several metrics, while being significantly more efficient.
Key Capabilities
- Efficient Distillation: Achieves performance comparable to a 30B teacher model with only 2B parameters, making it production-ready and deployable on a single GPU.
- Superior Table Extraction: Demonstrates a notable +12.6pp improvement in Table TEDS over its teacher, indicating enhanced ability to recognize and structure tabular data.
- Optimized for Korean Documents: Specifically fine-tuned on complex Korean government document layouts, including tables, forms, and multi-column structures.
- Improved RAG Retrieval: Ranks #1 in Hit@1, Hit@5, and MRR@10 among six parsers on Korean government documents, proving its effectiveness in enhancing Retrieval-Augmented Generation (RAG) pipelines.
- Structured Markdown Output: Converts document images into well-structured Markdown, preserving headings, tables, formulas, and reading order, and can extract data from charts into tables.
Good For
- Digitization and parsing of Korean government documents.
- Preprocessing documents for RAG pipelines, converting PDFs into structured Markdown for improved retrieval.
- Parsing academic papers, including complex elements like tables, formulas, and maintaining reading order.
- Bilingual document processing, with optimization for both Korean and English content.