joelbarmettler/md-reheader
joelbarmettler/md-reheader is a 0.6 billion parameter Qwen3 fine-tuned language model developed by Joel Barmettler. This specialized model is designed to restore heading hierarchy in markdown documents, predicting correct H1-H6 levels for flattened headings. It excels at fixing document structure after PDF-to-markdown conversion, making it ideal for improving RAG chunking and document navigation.
Loading preview...
Overview
md-reheader is a specialized 0.6 billion parameter language model, fine-tuned from Qwen/Qwen3-0.6B, developed by Joel Barmettler. Its core function is to restore the correct heading hierarchy (H1-H6 levels) in markdown documents where structure has been flattened, often by PDF-to-markdown conversion tools. This model processes documents by flattening all headings to #, stripping body text to preserve structural cues, and then predicting the appropriate heading level for each.
Key Capabilities
- Heading Hierarchy Restoration: Accurately predicts and applies correct H1-H6 markdown heading levels.
- High Accuracy: Achieves 56.1% exact match and 80.6% per-heading accuracy on a benchmark of 7,321 documents from GitHub and Wikipedia.
- Efficient Processing: Can reheader documents up to 8k tokens (after stripping) in seconds on a GPU, or minutes on a CPU.
- Flexible Deployment: Offers CLI, Python API, direct
transformersusage, and remote inference via vLLM or other OpenAI-compatible endpoints.
Good for
- Fixing PDF-to-Markdown Output: Essential for correcting document structure from tools like MinerU, Docling, or Marker.
- Improving RAG Systems: Ensures accurate chunking for Retrieval Augmented Generation by restoring logical document flow.
- Enhancing Document Navigation: Re-establishes functional Tables of Contents and improves readability for users.
- Automated Document Processing: Integrates easily into workflows requiring structured markdown output.