thelamapi/next-ocr
Thelamapi/next-ocr is an 8-billion parameter vision-transformer model based on Qwen 3, specifically optimized for optical character recognition (OCR) tasks. It excels at accurate text extraction from images and documents, including complex mathematical formulas and tabular content. This model offers robust multilingual support for over 30 languages and is designed for efficiency in document understanding and analysis workflows. Its primary strength lies in handling structured and unstructured documents with high precision, making it suitable for enterprise document management and digitization.
Loading preview...
Next OCR 8B: Compact, Multilingual, Math-Optimized OCR
Next OCR 8B is an 8-billion parameter vision-transformer model, built on Qwen 3, designed for advanced optical character recognition. Developed by Lamapi, this model specializes in accurately extracting text from diverse visual inputs, including complex mathematical expressions and tabular data. It provides robust multilingual support for over 30 languages, making it versatile for global applications.
Key Capabilities
- High-Accuracy OCR: Reliably extracts text from images, documents, and screenshots.
- Multilingual Support: Processes text in 30+ languages, including Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, and Russian.
- Layout & Math Awareness: Effectively handles structured documents like tables, forms, and mathematical formulas.
- Efficiency: Optimized to be lightweight and efficient for resource-constrained environments.
- Instruction-Tuned: Designed for comprehensive document understanding and analysis tasks.
Performance Highlights
Next OCR demonstrates strong performance across various benchmarks:
- OCR-Bench Accuracy: Achieves 99.0% accuracy, outperforming PaddleOCR (95.2%) and Deepseek OCR (90.6%).
- Multilingual Accuracy: Scores 96.8%, surpassing Google Cloud Vision (95.5%) and Azure Document Intelligence (93.6%).
- Layout / Table Understanding: Reaches 95.3%, on par with PaddleOCR and exceeding Google Cloud Vision (93.6%).
- Specialized Tasks: Shows 92% accuracy for handwriting, 96% for scene text, and 91% for complex tables.
Ideal Use Cases
This model is particularly well-suited for:
- Document digitization and archival.
- Automated processing of invoices, receipts, and forms.
- Building multilingual OCR pipelines.
- Extracting data from tables, forms, and mathematical formulas.
- Enhancing enterprise document management systems.