Atlas-Chat-27B: Advanced Darija Language Model
Atlas-Chat-27B is the largest model in the Atlas-Chat family, developed by MBZUAI France Lab as part of the Jais initiative. This 27 billion parameter model is instruction-tuned specifically for Darija, the colloquial Arabic of Morocco, building upon the Gemma 2 architecture. It aims to make advanced AI accessible to Darija speakers and promote innovation in this low-resource language.
Key Capabilities
- Darija Language Generation: Excels in producing fluent and contextually rich Moroccan Darija text.
- Instruction-Following: Designed for various applications including question answering, summarization, and translation in Darija.
- Resource-Efficient Deployment: Despite its size, it's optimized for deployment in environments like laptops, desktops, or personal cloud setups.
- Multilingual Adaptation: Trained on diverse datasets, including synthetic instructions tailored to Moroccan culture, public Moroccan Arabic datasets, and translated English/multilingual instruction-tuning datasets.
Performance Highlights
Atlas-Chat-27B demonstrates strong performance across Darija-specific benchmarks, consistently outperforming other models in its class. For instance, it achieves 61.95 on DarijaMMLU, 48.37 on DarijaHellaSwag, and 75.67 on Belebele Ary. In standard NLP tasks, it shows significant results in translation (e.g., 29.55 BLEU on DODa-10k) and transliteration (33.03 BLEU on DODa-10k), as well as strong summarization and sentiment analysis capabilities.
Intended Use Cases
- Developing conversational agents and chatbots that operate in Darija.
- Facilitating translation, summarization, and content generation in informal Moroccan dialect.
- Supporting cultural research related to Morocco and its language.