AyaEhab258/NASAQ4.1

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

AyaEhab258/NASAQ4.1 is a 7.9 billion parameter vision-language model fine-tuned from Gemma 4 E4B for Optical Character Recognition (OCR) of historical Arabic calligraphy. It specializes in transcribing various styles including Naskh, Thuluth, Diwani, Kufic, and Muhaqqaq, trained on the HICMA dataset and custom samples. This model is optimized for accurate text extraction from complex calligraphic images, achieving a Levenshtein Ratio of 86.22% on its test set.

Loading preview...

Overview

AyaEhab258/NASAQ4.1 is a 7.9 billion parameter vision-language model, fine-tuned from Gemma 4 E4B, specifically designed for Optical Character Recognition (OCR) of historical Arabic calligraphy. It focuses on transcribing text from various calligraphic styles such as Naskh, Thuluth, Diwani, Kufic, and Muhaqqaq, utilizing the HICMA dataset alongside custom collected samples.

Training Approach

The model underwent LoRA fine-tuning with an OCR-only objective, meaning it does not perform joint style classification. The training involved a two-phase process: an initial base fine-tune followed by a refinement phase to optimize performance.

Performance Metrics

On a held-out test set of 602 images, the model achieved a Character Error Rate (CER) of 20.65%, a Word Error Rate (WER) of 48.17%, and a Levenshtein Ratio of 86.22%. Performance varies by style, with Naskh and Muhaqqaq showing the lowest CERs (12.9% and 14.1% respectively), while Kufic and Diwani have higher error rates (47.4% and 51.6%) primarily due to limited training data for these specific styles.

Key Capabilities

  • Specialized OCR: Highly effective at transcribing historical Arabic calligraphy.
  • Multi-style Support: Handles Naskh, Thuluth, Diwani, Kufic, and Muhaqqaq scripts.
  • Image-to-Text: Processes image inputs to generate transcribed Arabic text.

When to Use This Model

This model is ideal for researchers, historians, and developers working with historical Arabic documents, manuscripts, or any visual content containing complex Arabic calligraphy that requires accurate text extraction. It is particularly strong for Naskh and Muhaqqaq styles.