BounharAbdelaziz/Al-Atlas-LLM-0.5B

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Feb 19, 2025Architecture:Transformer0.0K Gated Cold

Al-Atlas-LLM is a 0.5 billion parameter transformer-based language model developed by BounharAbdelaziz, specifically trained on 155 million tokens of authentic Moroccan Darija content. This model is the first dedicated foundation model for Morocco's primary spoken dialect, capturing its nuanced cultural context and local expressions. It excels at tasks requiring deep understanding and generation of Moroccan Arabic, making it ideal for chatbots, content creation, and sentiment analysis in local markets. With a 2048-token context window, it provides specialized linguistic capabilities for Darija applications.

Loading preview...

Al-Atlas-LLM: The First Dedicated Moroccan Darija LLM

Al-Atlas-LLM, developed by BounharAbdelaziz, is a 0.5 billion parameter transformer-based language model uniquely focused on Moroccan Darija. It stands out as the first foundation model specifically trained for Morocco's primary spoken dialect, addressing a significant gap in linguistic AI. The model was trained on a meticulously curated dataset of 155 million tokens, ensuring authentic representation of Darija by sourcing content from social media, transcribed speech, online forums, and local media, while carefully filtering out Modern Standard Arabic or other dialect contamination.

Key Capabilities

  • Dedicated Darija Understanding & Generation: Specifically designed to comprehend and produce text in Moroccan Arabic, including its unique cultural nuances and expressions.
  • High-Quality Data Foundation: Built upon a 155M token corpus of pure Darija, ensuring linguistic accuracy and relevance.
  • Compact & Efficient: A 0.5B parameter model with a 2048-token context window, suitable for specialized applications.

Good for

  • Chatbots for Moroccan Users: Enabling natural and culturally relevant conversations.
  • Content Generation in Darija: Creating localized text for various platforms.
  • Text Classification & Sentiment Analysis: Analyzing Moroccan content for market insights or user feedback.
  • Customer Service Automation: Providing support in the native dialect.
  • Educational Tools: Developing resources for Darija speakers.