joelbarmettler/md-reheader

TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Apr 5, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

joelbarmettler/md-reheader is a 0.6 billion parameter Qwen3 fine-tuned language model developed by Joel Barmettler. This specialized model is designed to restore heading hierarchy in markdown documents, predicting correct H1-H6 levels for flattened headings. It excels at fixing document structure after PDF-to-markdown conversion, making it ideal for improving RAG chunking and document navigation.

Loading preview...

Overview

md-reheader is a specialized 0.6 billion parameter language model, fine-tuned from Qwen/Qwen3-0.6B, developed by Joel Barmettler. Its core function is to restore the correct heading hierarchy (H1-H6 levels) in markdown documents where structure has been flattened, often by PDF-to-markdown conversion tools. This model processes documents by flattening all headings to #, stripping body text to preserve structural cues, and then predicting the appropriate heading level for each.

Key Capabilities

  • Heading Hierarchy Restoration: Accurately predicts and applies correct H1-H6 markdown heading levels.
  • High Accuracy: Achieves 56.1% exact match and 80.6% per-heading accuracy on a benchmark of 7,321 documents from GitHub and Wikipedia.
  • Efficient Processing: Can reheader documents up to 8k tokens (after stripping) in seconds on a GPU, or minutes on a CPU.
  • Flexible Deployment: Offers CLI, Python API, direct transformers usage, and remote inference via vLLM or other OpenAI-compatible endpoints.

Good for

  • Fixing PDF-to-Markdown Output: Essential for correcting document structure from tools like MinerU, Docling, or Marker.
  • Improving RAG Systems: Ensures accurate chunking for Retrieval Augmented Generation by restoring logical document flow.
  • Enhancing Document Navigation: Re-establishes functional Tables of Contents and improves readability for users.
  • Automated Document Processing: Integrates easily into workflows requiring structured markdown output.