Overview
md-reheader is a specialized language model developed by Joel Barmettler, fine-tuned from the Qwen3-0.6B architecture. Its primary function is to restore the correct heading hierarchy (H1-H6) in markdown documents that have lost their structural integrity, often due to flattening during PDF-to-markdown conversion processes. The model analyzes document context and semantics to predict appropriate heading levels.
Key Capabilities
- Heading Hierarchy Restoration: Accurately re-establishes logical heading levels (H1 through H6) in markdown files.
- Contextual Understanding: Utilizes document context to infer the correct structural relationships between headings.
- High Accuracy: Achieves 80.6% per-heading accuracy and 56.1% exact match on test documents, significantly outperforming heuristic methods.
- Efficient Processing: Can reheader documents up to 8k tokens in seconds on a GPU.
Performance Highlights
Evaluated on 7,321 GitHub and Wikipedia markdown documents, md-reheader demonstrates strong performance:
- Exact match: 56.1%
- Per-heading accuracy: 80.6%
- Hierarchy preservation: 91.0%
- Mean absolute error: 0.22
While accuracy for very deep nesting (H5/H6) is lower (45-50%), the model generally preserves relative structure well.
Good For
- Post-processing PDF-to-Markdown Conversions: Ideal for cleaning up output from tools like MinerU, Docling, or Marker that often flatten document structures.
- Automated Document Structuring: Useful for pipelines requiring consistent and correct markdown heading hierarchies.
- Improving Document Readability: Enhances the navigability and readability of automatically generated markdown content.