Name: Wigtn/Qwen3-VL-2B-WigtnOCR API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Wigtn

Overview

WigtnOCR-2B is a 2 billion parameter Vision-Language Model (VLM) developed by WIGTN Crew, specifically designed for robust document parsing. It is distilled from a 30B parameter teacher model, Qwen3-VL-30B, using a pseudo-label distillation method that leverages quality-filtered ground truth. This approach allows WigtnOCR-2B to match or even surpass its larger teacher in document parsing quality across several metrics, while being significantly more efficient.

Key Capabilities

Efficient Distillation: Achieves performance comparable to a 30B teacher model with only 2B parameters, making it production-ready and deployable on a single GPU.
Superior Table Extraction: Demonstrates a notable +12.6pp improvement in Table TEDS over its teacher, indicating enhanced ability to recognize and structure tabular data.
Optimized for Korean Documents: Specifically fine-tuned on complex Korean government document layouts, including tables, forms, and multi-column structures.
Improved RAG Retrieval: Ranks #1 in Hit@1, Hit@5, and MRR@10 among six parsers on Korean government documents, proving its effectiveness in enhancing Retrieval-Augmented Generation (RAG) pipelines.
Structured Markdown Output: Converts document images into well-structured Markdown, preserving headings, tables, formulas, and reading order, and can extract data from charts into tables.

Good For

Digitization and parsing of Korean government documents.
Preprocessing documents for RAG pipelines, converting PDFs into structured Markdown for improved retrieval.
Parsing academic papers, including complex elements like tables, formulas, and maintaining reading order.
Bilingual document processing, with optimization for both Korean and English content.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)