prithivMLmods/proxima-ocr-d.markdown-post3.0.l
prithivMLmods/proxima-ocr-d.markdown-post3.0.l is an 8 billion parameter experimental document AI multimodal model, fine-tuned on Qwen3-VL-8B-Instruct, specializing in high-precision OCR and structured document reconstruction. It converts documents into Markdown, HTML-Markdown, and hybrid enriched formats, preserving complex layouts, semantic ordering, and embedding inline programming languages. This model excels at interpreting tables, forms, and mathematical content for various documentation and knowledge base applications.
Loading preview...
Overview
prithivMLmods/proxima-ocr-d.markdown-post3.0.l is an 8 billion parameter experimental multimodal model built upon Qwen3-VL-8B-Instruct, designed for advanced document AI tasks. Its core capability lies in transforming complex visual documents into structured Markdown, HTML-Markdown, and other enriched documentation formats. The model focuses on maintaining layout hierarchy, formatting consistency, and semantic ordering, even for intricate structures like tables, forms, and mathematical content.
Key Capabilities
- Dynamic Markdown Reconstruction: Converts complex documents to structured Markdown or HTML-Markdown, preserving layout, formatting, and semantic order.
- Inline Code and Language Embedding: Directly adapts Python, JavaScript, LaTeX, and shell syntax into reconstructed documents.
- High Fidelity OCR and Visual Parsing: Accurately recognizes text across structured and unstructured scanned documents, including multi-page layouts.
- Complex Layout Interpretation: Interprets tables, grids, equations, graphs, multi-column layouts, and forms without structural distortion.
- Multimodal Long Reasoning: Supports advanced document question answering and reasoning across extensive input streams like slides and manuscripts.
Intended Use Cases
- OCR to Markdown or HTML-Markdown conversion.
- Complex document reconstruction and formatting regeneration.
- Multi-page document reasoning and retrieval.
- Table extraction and structured output transformation.
- Mathematical OCR and LaTeX conversion.
- Form extraction and structured entity generation.
- Knowledge base indexing and large document QA.
- Documentation regeneration for enterprise automation.
Limitations
Users should be aware that accuracy may decrease with extremely damaged or poorly scanned images. The model requires significant GPU VRAM for long sequences and multi-page documents, and language accuracy can vary for low-resource scripts. Complex objects or highly irregular layouts may occasionally result in formatting misalignment.