MBZUAI-Paris/Atlas-Chat-27B

TEXT GENERATIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Sep 22, 2024License:gemmaArchitecture:Transformer0.0K Cold

Atlas-Chat-27B is a 27 billion parameter instruction-tuned language model developed by MBZUAI France Lab, based on the Gemma 2 architecture. It is specifically optimized for generating fluent Moroccan Darija text, excelling in tasks like question answering, summarization, and translation within this low-resource dialect. This model is designed to make advanced AI accessible for Darija speakers and can be deployed in resource-constrained environments.

Loading preview...

Atlas-Chat-27B: Advanced Darija Language Model

Atlas-Chat-27B is the largest model in the Atlas-Chat family, developed by MBZUAI France Lab as part of the Jais initiative. This 27 billion parameter model is instruction-tuned specifically for Darija, the colloquial Arabic of Morocco, building upon the Gemma 2 architecture. It aims to make advanced AI accessible to Darija speakers and promote innovation in this low-resource language.

Key Capabilities

  • Darija Language Generation: Excels in producing fluent and contextually rich Moroccan Darija text.
  • Instruction-Following: Designed for various applications including question answering, summarization, and translation in Darija.
  • Resource-Efficient Deployment: Despite its size, it's optimized for deployment in environments like laptops, desktops, or personal cloud setups.
  • Multilingual Adaptation: Trained on diverse datasets, including synthetic instructions tailored to Moroccan culture, public Moroccan Arabic datasets, and translated English/multilingual instruction-tuning datasets.

Performance Highlights

Atlas-Chat-27B demonstrates strong performance across Darija-specific benchmarks, consistently outperforming other models in its class. For instance, it achieves 61.95 on DarijaMMLU, 48.37 on DarijaHellaSwag, and 75.67 on Belebele Ary. In standard NLP tasks, it shows significant results in translation (e.g., 29.55 BLEU on DODa-10k) and transliteration (33.03 BLEU on DODa-10k), as well as strong summarization and sentiment analysis capabilities.

Intended Use Cases

  • Developing conversational agents and chatbots that operate in Darija.
  • Facilitating translation, summarization, and content generation in informal Moroccan dialect.
  • Supporting cultural research related to Morocco and its language.