Name: MohamedSamyAI/legal-documents-ocr-parser-1.0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MohamedSamyAI

Arabic Legal Documents OCR Parser

This model, developed by MohamedSamyAI, is a multimodal vision-language model (4.3 billion parameters) fine-tuned from Google's Gemma-3-4B-IT. Its core purpose is to extract structured metadata in JSON format from scanned images of Arabic legal documents. Unlike general OCR, this model focuses on classifying and extracting specific fields rather than full-text recognition.

Key Capabilities

Structured Metadata Extraction: Outputs comprehensive JSON objects containing classified document metadata, including document type, issuing authority, physical properties, official seals/stamps, signatures, and routing information.
Multimodal Processing: Takes an image of a legal document page as input and processes it to extract textual and structural information.
Specialized for Arabic Legal Documents: Fine-tuned on a custom dataset of Arabic government regulations, ministerial correspondence, and institutional records, ensuring high relevance and accuracy for this domain.
Full-Precision Merged Weights: The LoRA adapter used for fine-tuning has been merged into the base model, providing full-precision weights ready for direct inference without additional adapter loading or quantization.

Good for

Automated Document Digitization: Streamlining the cataloging and indexing of large archives of scanned Arabic legal documents.
Legal Document Triage: Rapidly classifying document types and identifying key entities like issuing authorities.
Metadata Auto-Population: Integrating into Document Management Systems (DMS) to automatically populate metadata fields.
RAG Pipelines: Serving as a structured extraction layer to feed precise metadata into Retrieval-Augmented Generation systems.

Limitations

Not for Full-Text OCR: Does not extract the full body text of documents.
Domain Specificity: Performance is not guaranteed for non-Arabic or non-legal document types.
Human Review Recommended: Outputs should be verified by a human for critical legal applications due to potential hallucination risks.

Overview

Arabic Legal Documents OCR Parser

Key Capabilities

Good for

Limitations

Full Model Card (README)