GemMaroc-27B: Darija Proficiency with Green AI

GemMaroc-27B is a 27 billion parameter large language model developed by Abderrahman Skiredj, fine-tuned from Google's Gemma 3 architecture. Its primary goal is to unlock Moroccan Darija proficiency, addressing the underserved population of over 36 million Moroccan Arabic speakers. This model stands out for its "minimal-data, green-AI" training recipe, which efficiently adds fluent Darija generation while maintaining Gemma-27B's robust reasoning capabilities.

Key Capabilities

Fluent Darija Generation: Specifically trained to understand and generate Moroccan Darija instructions.
Cross-Lingual Reasoning: Preserves strong reasoning abilities from its Gemma 3 base, with 20% of training data kept in English for robustness.
Efficient Training: Achieves high Darija competence with a significantly lower compute budget (48 GPU·h) compared to similar models, emphasizing a "quality-over-quantity" approach to data.
Instruction Following: Supervised fine-tuning on 50K high-quality Darija/English instructions.

Good For

Inclusive AI Applications: Developing LLM-powered tools and services for Moroccan Arabic speakers.
Reasoning Tasks: Leveraging its strong reasoning foundation for complex problem-solving.
Resource-Efficient Deployment: Suitable for scenarios where energy consumption and training costs are a concern.
Multilingual Chatbots: Creating conversational agents that can fluently interact in both Darija and English.

Benchmark Highlights

GemMaroc-27B demonstrates competitive performance against Atlas-Chat-27B, achieving 60.5% on Darija HellaSwag and 84.2% on GSM8K @5, indicating strong reasoning and language understanding in both Darija and English contexts.

Overview

GemMaroc-27B: Darija Proficiency with Green AI

Key Capabilities

Good For

Benchmark Highlights

Full Model Card (README)