AbderrahmanSkiredj1/GemMaroc-27b-it

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:May 18, 2025Architecture:Transformer0.0K Cold

GemMaroc-27b-it by AbderrahmanSkiredj1 is a 27 billion parameter decoder-only Transformer model, based on Google's Gemma 3 architecture, with a context length of 2,048 tokens. It is specifically fine-tuned for Moroccan Darija proficiency using a minimal-data approach, while preserving strong cross-lingual reasoning abilities. This model excels at generating fluent Darija and English instructions, making it ideal for applications targeting the over 36 million speakers of Moroccan Arabic.

Loading preview...

GemMaroc-27B: Darija Proficiency with Green AI

GemMaroc-27B is a 27 billion parameter large language model developed by Abderrahman Skiredj, fine-tuned from Google's Gemma 3 architecture. Its primary goal is to unlock Moroccan Darija proficiency, addressing the underserved population of over 36 million Moroccan Arabic speakers. This model stands out for its "minimal-data, green-AI" training recipe, which efficiently adds fluent Darija generation while maintaining Gemma-27B's robust reasoning capabilities.

Key Capabilities

  • Fluent Darija Generation: Specifically trained to understand and generate Moroccan Darija instructions.
  • Cross-Lingual Reasoning: Preserves strong reasoning abilities from its Gemma 3 base, with 20% of training data kept in English for robustness.
  • Efficient Training: Achieves high Darija competence with a significantly lower compute budget (48 GPU·h) compared to similar models, emphasizing a "quality-over-quantity" approach to data.
  • Instruction Following: Supervised fine-tuning on 50K high-quality Darija/English instructions.

Good For

  • Inclusive AI Applications: Developing LLM-powered tools and services for Moroccan Arabic speakers.
  • Reasoning Tasks: Leveraging its strong reasoning foundation for complex problem-solving.
  • Resource-Efficient Deployment: Suitable for scenarios where energy consumption and training costs are a concern.
  • Multilingual Chatbots: Creating conversational agents that can fluently interact in both Darija and English.

Benchmark Highlights

GemMaroc-27B demonstrates competitive performance against Atlas-Chat-27B, achieving 60.5% on Darija HellaSwag and 84.2% on GSM8K @5, indicating strong reasoning and language understanding in both Darija and English contexts.