McGill-NLP/AfriqueLlama-8B
McGill-NLP/AfriqueLlama-8B is an 8 billion parameter causal language model developed by McGill-NLP, based on Llama 3.1. It has been specifically adapted through continued pre-training on approximately 26 billion tokens to enhance performance across 20 African languages, while maintaining strong capabilities in high-resource languages. The model features a native context length of 128,000 tokens and is optimized for multilingual applications, particularly in African linguistic contexts.
Loading preview...
Model Overview
McGill-NLP/AfriqueLlama-8B is an 8 billion parameter causal language model from the AfriqueLLM suite, developed by McGill-NLP. It is built upon the Llama 3.1-8B architecture and has undergone extensive continued pre-training (CPT) on approximately 26 billion tokens of multilingual data. This adaptation significantly improves its performance in 20 African languages, alongside maintaining proficiency in high-resource languages like English, French, Portuguese, and Arabic.
Key Capabilities
- Multilingual Proficiency: Adapted for 20 African languages (e.g., Swahili, Hausa, Yoruba, Amharic) and four high-resource languages.
- Extended Context Window: Features a native context length of 128,000 tokens.
- Robust Training: Trained on a diverse corpus including African monolingual data (22.8B tokens), code (1B tokens from CornStack-Python), mathematics (~1B tokens from FineMath-4+), and synthetic data, using UniMax sampling for balanced distribution.
- Performance Improvement: Demonstrates a +14.1% (39.9%) overall score improvement on multilingual benchmarks compared to its base Llama 3.1-8B model, as shown in evaluations like AfriMGSM, AfriMMLU, and FLORES.
Good For
- Applications requiring strong language understanding and generation in a wide array of African languages.
- Research and development in low-resource language NLP.
- Tasks benefiting from a large context window and robust reasoning capabilities, supported by its code and mathematics training.