Name: ByteDance/Dolphin-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ByteDance

Dolphin-v2: Universal Document Parsing

Dolphin-v2, developed by ByteDance, is an advanced 3 billion parameter document parsing model built upon the Qwen2.5-VL-3B backbone. It significantly enhances document understanding capabilities by seamlessly processing both digital-born and photographed documents, even those with realistic distortions. The model utilizes a document-type-aware two-stage architecture, incorporating scalable anchor prompting for robust performance.

Key Capabilities

Universal Document Support: Handles a wide array of document types, including scanned and distorted images.
Expanded Element Coverage: Supports 21 distinct element categories, such as hierarchical headings, paragraphs, mathematical formulas (LaTeX), tables (HTML), and code blocks with indentation preservation.
Enhanced Precision: Achieves accurate spatial localization through the use of absolute pixel coordinates.
Hybrid Parsing Strategy: Employs element-wise parallel parsing for digital documents and holistic page-level parsing for photographed documents.
Specialized Modules: Includes dedicated parsing for complex elements like formulas (P_formula), code (P_code), and tables (P_table).

Performance Highlights

Dolphin-v2 demonstrates superior performance, achieving an 89.45 overall score on OmniDocBench v1.5, marking a 14.78 point improvement over its predecessor. Notable scores include 86.72 CDM for formula parsing and 87.02 TEDS for table structure.

Good For

Automated data extraction from diverse document formats.
Converting complex documents (e.g., research papers, technical manuals) into structured data.
Applications requiring high-accuracy parsing of tables, formulas, and code from images or PDFs.

Overview

Dolphin-v2: Universal Document Parsing

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)