MaLA-LM/emma-500-llama3.1-8b-bi

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 10, 2025License:llama3Architecture:Transformer0.0K Cold

The EMMA-500 Llama 3.1 8B is an 8 billion parameter multilingual language model developed by MaLA-LM, continually pre-trained on the Llama 3.1 8B architecture. It supports 546 languages with substantial training data, leveraging the MaLA Corpus which includes bilingual translation data across 2,500+ language pairs. This model excels in multilingual tasks such as commonsense reasoning, machine translation, and text classification, particularly for low-resource languages.

Loading preview...

EMMA-500 Llama 3.1 8B: Massively Multilingual Adaptation

EMMA-500 Llama 3.1 8B is an 8 billion parameter language model from MaLA-LM, continually pre-trained on the Llama 3.1 8B base architecture. Its primary focus is enhancing language representation, especially for low-resource languages, by leveraging a diverse and extensive multilingual dataset.

Key Capabilities & Features

  • Massive Multilingual Support: Supports 546 languages with over 100k tokens of training data each, and includes bilingual translation data for over 2,500 language pairs.
  • Continual Pre-training: Built upon Llama 3.1 8B, it undergoes continual pre-training using the comprehensive MaLA Corpus.
  • Diverse Data Mix: Trained on 671 billion tokens from a bilingual mix including code, books, instruction data, and academic papers.
  • Task Performance: Designed to excel in multilingual tasks such as commonsense reasoning, machine translation, and text classification.

Use Cases & Considerations

  • Ideal for: Massively multilingual NLP tasks, particularly machine translation and applications involving low-resource languages.
  • Limitations: Users should be aware of potential performance regression on some tasks and high-resource languages compared to monolingual models. It is not recommended for real-world, high-stakes scenarios due to these limitations.