infly/Infinity-Parser2-Pro
Infinity-Parser2-Pro by infly is a 35.1 billion parameter multi-modal document understanding model, optimized for maximum accuracy in precision-critical document parsing tasks. It achieves state-of-the-art results on olmOCR-Bench (87.6%) and ParseBench (74.3%), surpassing other frontier models. This model excels at complex document parsing, element parsing, chart parsing, chemical formula parsing, document VQA, and general multimodal understanding, leveraging an upgraded data engine and multi-task reinforcement learning.
Loading preview...
Infinity-Parser2-Pro: Advanced Document Understanding
Infinity-Parser2-Pro is infly's flagship 35.1 billion parameter multi-modal document understanding model, engineered for high accuracy in complex parsing tasks. It leverages an upgraded data engine with nearly 5 million diverse document parsing samples and a novel multi-task reinforcement learning approach for co-optimization across various tasks.
Key Capabilities
- Breakthrough Parsing Performance: Achieves 87.6% on olmOCR-Bench and 74.3% on ParseBench, outperforming models like DeepSeek-OCR-2 and PaddleOCR-VL.
- Comprehensive Task Support: Handles document parsing, element parsing, chart parsing, chemical formula parsing, document VQA, and general multimodal understanding.
- Zero-Shot Capabilities: Designed to unlock new zero-shot capabilities across a wide range of real-world business scenarios.
- Flexible Deployment: Supports native Transformers, an advanced
infinity_parser2wrapper for bulk processing, and vLLM for efficient inference.
Good For
- Precision-Critical Document Analysis: Ideal for applications requiring maximum accuracy in extracting information from diverse document types.
- Complex Layouts: Excels at parsing documents with multi-column layouts, historical newspapers, and academic papers with intricate formulas and notations.
- Structured Data Extraction: Highly effective for converting documents into structured formats like JSON or Markdown, including tables and charts.
Limitations
- Primarily supports English and Chinese documents; performance degrades with other languages.
- Accuracy may be reduced for charts with highly complex layouts or documents with multi-oriented elements.
- Does not capture fine-grained text formatting (e.g., bold, italic) and has suboptimal multimodal instruction-following capability for complex multi-step visual instructions.