Name: ONTHEIT/BizOnAI-OCR API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ONTHEIT

BizOnAI-OCR: Korean-Optimized Industrial Document OCR

BizOnAI-OCR, developed by ONTHEIT, is an 8 billion parameter optical character recognition (OCR) model based on Qwen3-VL-8B. Its primary focus is on accurately extracting information from a wide range of Korean industrial documents, such as contracts, medical records, and government paperwork. The model is specifically fine-tuned to handle the unique challenges of Korean document layouts, including decorative spacing, vertical tables, and mixed language content (Korean, English, Chinese).

Key Capabilities

Korean-first Optimization: Fine-tuned extensively on real-world Korean industrial documents for superior performance in this domain.
Bilingual Proficiency: While optimized for Korean, it maintains strong performance on English OCR tasks, as demonstrated by benchmarks.
Structured Markdown Output: Generates output in markdown format, preserving document structure, including tables, headings, and other formatting elements.
Efficient Deployment: Ready for efficient serving via vLLM (with an OpenAI-compatible API) or standard transformers library.

Performance Highlights

BizOnAI-OCR achieves an 83.0% overall score on KDoc-OCRBench, a challenging benchmark for Korean industrial PDFs, outperforming other models like olmOCR v0.2.0 and PaddleOCR-VL. It also demonstrates robust performance on the English olmOCR-bench, scoring 82.4% overall, indicating its strong bilingual capabilities.

Good for

Automating data extraction from Korean contracts, invoices, and legal documents.
Processing medical records and financial forms in Korean.
Applications requiring structured text output from complex, multi-lingual documents.

Overview

BizOnAI-OCR: Korean-Optimized Industrial Document OCR

Key Capabilities

Performance Highlights

Good for

Full Model Card (README)