MaLA-LM/emma-500-llama3-8b-bi

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 10, 2025License:llama3Architecture:Transformer Warm

MaLA-LM/emma-500-llama3-8b-bi is an 8 billion parameter multilingual language model developed by MaLA-LM, built upon the Llama 3 8B architecture. It is continually pre-trained on the MaLA Corpus, supporting 546 languages with substantial data and augmented with bilingual translation data across 2,500+ language pairs. This model excels in massively multilingual NLP tasks like machine translation and commonsense reasoning, particularly for low-resource languages.

Loading preview...

EMMA-500 Llama 3 8B Bilingual Model

EMMA-500 Llama 3 8B is a multilingual language model from MaLA-LM, continually pre-trained on the Llama 3 8B architecture. It leverages the extensive MaLA Corpus, which includes over 500 languages, augmented with code, instruction data, and papers. This specific emma-500-llama3-8b-bi variant is distinguished by its inclusion of bilingual translation data across more than 2,500 language pairs, in addition to monolingual data.

Key Capabilities

  • Massively Multilingual: Supports 546 languages with over 100k tokens each, making it highly effective for diverse linguistic tasks.
  • Enhanced Language Representation: Improves representation, especially for low-resource languages, through continual pre-training on a 671 billion token dataset.
  • Bilingual Translation: Optimized for machine translation and cross-lingual understanding due to its unique bilingual data mix.
  • Diverse Data Mix: Trained on a comprehensive mix of code, books, instruction data, and academic papers, enhancing its general multilingual NLP capabilities.

Good For

  • Massively Multilingual NLP tasks: Particularly strong in areas like machine translation and text classification across many languages.
  • Low-resource language applications: Designed to improve performance in languages with limited existing data.
  • Research in multilingual LLMs: Provides a robust base for exploring language adaptation and cross-lingual transfer learning.

Limitations

  • May exhibit performance regression on some tasks and high-resource languages compared to models specifically optimized for those.
  • Not recommended for real-world, high-stakes scenarios without further fine-tuning and validation.