bakrianoo/arabic-legal-documents-ocr-1.0

Warm
Public
Vision
4.3B
BF16
32768
1
Jan 13, 2026
License: gemma
Hugging Face
Overview

Model Overview

The bakrianoo/arabic-legal-documents-ocr-1.0 is a specialized Vision Language Model (VLM) developed by bakrianoo, finetuned from Gemma-3-4B-IT. Its core purpose is to accurately extract structured data from scanned Arabic legal documents, particularly those of low quality. The model utilizes VLM reasoning to interpret and process visual information from these documents, converting it into structured data.

Key Capabilities

  • Structured Data Extraction: Designed to pull specific information into a structured format (e.g., JSON) from legal documents.
  • Handles Low-Quality Scans: Optimized to perform effectively even with challenging inputs like blurry or poorly scanned documents.
  • Arabic Legal Document Focus: Tailored specifically for the nuances and complexities of Arabic legal texts.
  • VLM Reasoning: Leverages advanced Vision Language Model capabilities for robust OCR and data interpretation.

Usage and Preprocessing

To achieve optimal results, images must undergo mandatory preprocessing, including resizing and conversion to grayscale, before being fed to the model. Utility functions are provided for both standard PIL usage and Base64 encoding for API integrations (vLLM/OpenAI API). The model supports inference via transformers for local use and vLLM for high-performance serving, with robust JSON parsing capabilities using json-repair to handle potential output inconsistencies.

Good For

  • Automating data entry from scanned Arabic legal documents.
  • Building applications that require structured information extraction from challenging document images.
  • Research and development in Arabic OCR and VLM applications for specialized domains.