MohamedSamyAI/legal-documents-ocr-parser-1.0

Hugging Face
VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:May 10, 2026Architecture:Transformer Warm

MohamedSamyAI/legal-documents-ocr-parser-1.0 is a 4.3 billion parameter multimodal vision-language model, fine-tuned from Google's Gemma-3-4B-IT. Developed by MohamedSamyAI, it specializes in extracting structured JSON metadata from scanned Arabic legal documents, including government regulations and official correspondence. This model is optimized for automated document digitization pipelines and legal document triage, providing classified metadata such as document type, issuing authority, and official marks.

Loading preview...

Arabic Legal Documents OCR Parser

This model, developed by MohamedSamyAI, is a multimodal vision-language model (4.3 billion parameters) fine-tuned from Google's Gemma-3-4B-IT. Its core purpose is to extract structured metadata in JSON format from scanned images of Arabic legal documents. Unlike general OCR, this model focuses on classifying and extracting specific fields rather than full-text recognition.

Key Capabilities

  • Structured Metadata Extraction: Outputs comprehensive JSON objects containing classified document metadata, including document type, issuing authority, physical properties, official seals/stamps, signatures, and routing information.
  • Multimodal Processing: Takes an image of a legal document page as input and processes it to extract textual and structural information.
  • Specialized for Arabic Legal Documents: Fine-tuned on a custom dataset of Arabic government regulations, ministerial correspondence, and institutional records, ensuring high relevance and accuracy for this domain.
  • Full-Precision Merged Weights: The LoRA adapter used for fine-tuning has been merged into the base model, providing full-precision weights ready for direct inference without additional adapter loading or quantization.

Good for

  • Automated Document Digitization: Streamlining the cataloging and indexing of large archives of scanned Arabic legal documents.
  • Legal Document Triage: Rapidly classifying document types and identifying key entities like issuing authorities.
  • Metadata Auto-Population: Integrating into Document Management Systems (DMS) to automatically populate metadata fields.
  • RAG Pipelines: Serving as a structured extraction layer to feed precise metadata into Retrieval-Augmented Generation systems.

Limitations

  • Not for Full-Text OCR: Does not extract the full body text of documents.
  • Domain Specificity: Performance is not guaranteed for non-Arabic or non-legal document types.
  • Human Review Recommended: Outputs should be verified by a human for critical legal applications due to potential hallucination risks.