Name: DreamEternal/MinerU-Popo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DreamEternal

MinerU-Popo: Enhancing OCR with Document Structure

DreamEternal/MinerU-Popo is a 4 billion parameter model specifically designed for post-processing OCR outputs to create a coherent, document-level semantic structure. Unlike traditional OCR, MinerU-Popo focuses on transforming raw page-level text into a structured document tree, addressing challenges like cross-page geometric discontinuity and scalability for long documents.

Key Capabilities

Document Tree Construction: Builds a hierarchical structure from OCR results, integrating tables, text, titles, and image-text associations.
Four Subtasks: Performs table truncation analysis, text truncation analysis, title hierarchy analysis, and image-text association analysis.
Dynamic Chunking: Processes long documents efficiently by dynamically chunking content and synchronizing across chunks to maintain global consistency.
Document Enrichment: Beyond structuring, it can semantically generate summaries and split long-section nodes.
Improved Hierarchical Accuracy: Demonstrates significant improvements in TEDS (Tree Edit Distance Similarity) scores, boosting basic OCR outputs from various providers (e.g., MinerU from 53.7 to 90.6, MonkeyOCR from 48.9 to 87.4).

Performance Highlights

MinerU-Popo achieves a TEDS score of 90.6 with a processing speed of 0.37 documents/second, significantly outperforming larger pre-trained models like Qwen3-VL-32B (78.0 TEDS, 0.04 doc/s) in both accuracy and speed for this specific task. It also shows strong benefits for downstream retrieval and analysis tasks, improving accuracy across various categories on the ViDoRe V3 benchmark compared to raw RAG methods.

Good For

Developers needing to convert raw OCR output into structured, semantically rich documents.
Applications requiring accurate document hierarchy and understanding from scanned or image-based text.
Enhancing retrieval-augmented generation (RAG) systems by providing better structured input documents.

Overview

MinerU-Popo: Enhancing OCR with Document Structure

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)