Overview

md-reheader is a specialized 0.6 billion parameter language model, fine-tuned from Qwen/Qwen3-0.6B, developed by Joel Barmettler. Its core function is to restore the correct heading hierarchy (H1-H6 levels) in markdown documents where structure has been flattened, often by PDF-to-markdown conversion tools. This model processes documents by flattening all headings to #, stripping body text to preserve structural cues, and then predicting the appropriate heading level for each.

Key Capabilities

Heading Hierarchy Restoration: Accurately predicts and applies correct H1-H6 markdown heading levels.
High Accuracy: Achieves 56.1% exact match and 80.6% per-heading accuracy on a benchmark of 7,321 documents from GitHub and Wikipedia.
Efficient Processing: Can reheader documents up to 8k tokens (after stripping) in seconds on a GPU, or minutes on a CPU.
Flexible Deployment: Offers CLI, Python API, direct transformers usage, and remote inference via vLLM or other OpenAI-compatible endpoints.

Good for

Fixing PDF-to-Markdown Output: Essential for correcting document structure from tools like MinerU, Docling, or Marker.
Improving RAG Systems: Ensures accurate chunking for Retrieval Augmented Generation by restoring logical document flow.
Enhancing Document Navigation: Re-establishes functional Tables of Contents and improves readability for users.
Automated Document Processing: Integrates easily into workflows requiring structured markdown output.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)