Etherll/Tashkeel-700M
TEXT GENERATIONConcurrency Cost:1Model Size:0.7BQuant:BF16Ctx Length:32kPublished:Aug 12, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold
Etherll/Tashkeel-700M is a 700 million parameter model specifically designed for Arabic diacritization (Tashkeel). It is a fine-tuned version of the LiquidAI/LFM2-700M base model, trained on the arbml/tashkeela dataset. This model excels at accurately adding diacritics to unvocalized Arabic text, making it suitable for natural language processing tasks requiring correctly vocalized Arabic.
Loading preview...
Tashkeel-700M: Arabic Diacritization Model
Etherll/Tashkeel-700M is a specialized language model with 700 million parameters, engineered for the task of Arabic diacritization, also known as Tashkeel. This model was developed by fine-tuning the LiquidAI/LFM2-700M base model. The training utilized the arbml/tashkeela dataset, a dedicated resource for Arabic vocalization.
Key Capabilities
- Accurate Arabic Diacritization: The model's primary function is to add correct diacritics (vowel marks) to unvocalized Arabic text, ensuring proper pronunciation and grammatical structure.
- Fine-tuned Performance: By building upon
LiquidAI/LFM2-700Mand training on a specific dataset, Tashkeel-700M is optimized for its niche task. - Efficient Training: The base LFM2 model was trained using Unsloth and Huggingface's TRL library, indicating an efficient development process.
Good For
- Arabic NLP Applications: Ideal for any application where correctly vocalized Arabic text is crucial, such as text-to-speech systems, machine translation, or linguistic analysis.
- Text Preprocessing: Can be used as a preprocessing step to enhance the quality of Arabic text data before further processing by other NLP models.
- Research and Development: Provides a focused solution for researchers and developers working on Arabic language technologies.