DreamEternal/MinerU-Popo

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 22, 2026License:mitArchitecture:Transformer Open Weights Cold

DreamEternal/MinerU-Popo is a 4 billion parameter post-processing model designed to bridge the gap between page-level OCR outputs and document-level semantic structure. It constructs document tree structures by performing table truncation, text truncation, title hierarchy, and image-text association analyses. This model excels at processing long documents through dynamic chunking and synchronization, ensuring global consistency and enriching document structure. It significantly improves the hierarchical accuracy (TEDS) of various OCR outputs, making it ideal for enhancing document understanding and downstream retrieval tasks.

Loading preview...

MinerU-Popo: Enhancing OCR with Document Structure

DreamEternal/MinerU-Popo is a 4 billion parameter model specifically designed for post-processing OCR outputs to create a coherent, document-level semantic structure. Unlike traditional OCR, MinerU-Popo focuses on transforming raw page-level text into a structured document tree, addressing challenges like cross-page geometric discontinuity and scalability for long documents.

Key Capabilities

  • Document Tree Construction: Builds a hierarchical structure from OCR results, integrating tables, text, titles, and image-text associations.
  • Four Subtasks: Performs table truncation analysis, text truncation analysis, title hierarchy analysis, and image-text association analysis.
  • Dynamic Chunking: Processes long documents efficiently by dynamically chunking content and synchronizing across chunks to maintain global consistency.
  • Document Enrichment: Beyond structuring, it can semantically generate summaries and split long-section nodes.
  • Improved Hierarchical Accuracy: Demonstrates significant improvements in TEDS (Tree Edit Distance Similarity) scores, boosting basic OCR outputs from various providers (e.g., MinerU from 53.7 to 90.6, MonkeyOCR from 48.9 to 87.4).

Performance Highlights

MinerU-Popo achieves a TEDS score of 90.6 with a processing speed of 0.37 documents/second, significantly outperforming larger pre-trained models like Qwen3-VL-32B (78.0 TEDS, 0.04 doc/s) in both accuracy and speed for this specific task. It also shows strong benefits for downstream retrieval and analysis tasks, improving accuracy across various categories on the ViDoRe V3 benchmark compared to raw RAG methods.

Good For

  • Developers needing to convert raw OCR output into structured, semantically rich documents.
  • Applications requiring accurate document hierarchy and understanding from scanned or image-based text.
  • Enhancing retrieval-augmented generation (RAG) systems by providing better structured input documents.