McGill-NLP/AfriqueLlama-8B
McGill-NLP/AfriqueLlama-8B is an 8 billion parameter causal language model developed by McGill-NLP, based on Meta's Llama 3.1 architecture with a 128,000 token context length. It is specifically adapted for 20 African languages through continued pre-training on approximately 26 billion tokens, while retaining strong performance in high-resource languages. This model excels in multilingual applications, particularly for African language understanding and generation, and is part of the broader AfriqueLLM suite.
Loading preview...
AfriqueLlama-8B: African Language Adaptation of Llama 3.1
McGill-NLP's AfriqueLlama-8B is an 8 billion parameter causal language model built upon the Meta Llama 3.1 8B base. It is a core component of the AfriqueLLM suite, designed to significantly enhance performance across 20 African languages through extensive continued pre-training (CPT) on approximately 26 billion tokens of curated multilingual data. The model maintains a substantial native context length of 128,000 tokens.
Key Capabilities
- Multilingual Proficiency: Adapted for 20 specific African languages including Swahili, Hausa, Yoruba, Amharic, and Zulu, alongside strong performance in high-resource languages like English, French, Portuguese, and Arabic.
- Robust Training: Utilizes a diverse training corpus comprising African monolingual data (22.8B tokens), code (1B tokens from CornStack-Python), mathematics (1B tokens from FineMath-4+), and synthetic data (324M tokens).
- Performance Improvement: Demonstrates a significant performance uplift of +14.1% (39.9%) on a suite of African language benchmarks (AfriMGSM, AfriMMLU, AfriXNLI, Belebele, FLORES, INJONG, SIB-200) compared to its base Llama 3.1-8B model.
- Optimized Training: Benefits from advanced training techniques including UniMax sampling for balanced data distribution, BF16 mixed precision, and infrastructure leveraging 64 NVIDIA H100 GPUs with DeepSpeed and Flash Attention 3.
Good For
- Applications requiring strong language understanding and generation in African languages.
- Developers building multilingual systems targeting African linguistic contexts.
- Research and development in low-resource language NLP, particularly for African languages.