Overview
GemMaroc/Qwen2.5-7B-Instruct-darija is a 7 billion parameter instruction-tuned language model built upon the Qwen2.5 architecture. Developed by GemMaroc, this model is uniquely designed to address the underserved Moroccan Darija-speaking community, comprising over 36 million speakers. It stands out for its "minimal-data, green-AI" training approach, which involved supervised fine-tuning on a carefully curated 50K high-quality Darija/English instruction set, preserving the strong reasoning abilities of its base model.
Key Capabilities & Differentiators
- Darija Proficiency: Significantly improves performance on Darija benchmarks, including MMLU (52.7%), HellaSwag (45.5%), and GSM8K Darija (69.8%), surpassing the base Qwen2.5-7B-Instruct model.
- Cross-lingual Robustness: Maintains or improves English benchmark scores, such as MMLU (70.0%) and HellaSwag (73.9%), demonstrating effective cross-lingual reasoning.
- Efficiency & Green AI: Achieves competitive Darija scores with minimal energy consumption and a compact 7B parameter size, making it suitable for resource-constrained deployments.
- Instruction Following: Fine-tuned with a TULU-50K reasoning slice translated into Darija, ensuring strong instruction-following capabilities in both Darija and English.
Use Cases
This model is ideal for applications requiring fluent Moroccan Darija generation, cross-lingual understanding, and efficient deployment. Its balanced performance across Darija and English makes it a versatile choice for developers targeting the Moroccan market or needing a compact, capable multilingual LLM.