Etherll/Tashkeel-700M

TEXT GENERATIONConcurrency Cost:1Model Size:0.7BQuant:BF16Ctx Length:32kPublished:Aug 12, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Etherll/Tashkeel-700M is a 700 million parameter model specifically designed for Arabic diacritization (Tashkeel). It is a fine-tuned version of the LiquidAI/LFM2-700M base model, trained on the arbml/tashkeela dataset. This model excels at accurately adding diacritics to unvocalized Arabic text, making it suitable for natural language processing tasks requiring correctly vocalized Arabic.

Loading preview...

Tashkeel-700M: Arabic Diacritization Model

Etherll/Tashkeel-700M is a specialized language model with 700 million parameters, engineered for the task of Arabic diacritization, also known as Tashkeel. This model was developed by fine-tuning the LiquidAI/LFM2-700M base model. The training utilized the arbml/tashkeela dataset, a dedicated resource for Arabic vocalization.

Key Capabilities

  • Accurate Arabic Diacritization: The model's primary function is to add correct diacritics (vowel marks) to unvocalized Arabic text, ensuring proper pronunciation and grammatical structure.
  • Fine-tuned Performance: By building upon LiquidAI/LFM2-700M and training on a specific dataset, Tashkeel-700M is optimized for its niche task.
  • Efficient Training: The base LFM2 model was trained using Unsloth and Huggingface's TRL library, indicating an efficient development process.

Good For

  • Arabic NLP Applications: Ideal for any application where correctly vocalized Arabic text is crucial, such as text-to-speech systems, machine translation, or linguistic analysis.
  • Text Preprocessing: Can be used as a preprocessing step to enhance the quality of Arabic text data before further processing by other NLP models.
  • Research and Development: Provides a focused solution for researchers and developers working on Arabic language technologies.