McGill-NLP/AfriqueLlama-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jan 6, 2026License:cc-by-4.0Architecture:Transformer Open Weights Warm

McGill-NLP/AfriqueLlama-8B is an 8 billion parameter causal language model developed by McGill-NLP, based on Meta's Llama 3.1 architecture with a 128,000 token context length. It is specifically adapted for 20 African languages through continued pre-training on approximately 26 billion tokens, while retaining strong performance in high-resource languages. This model excels in multilingual applications, particularly for African language understanding and generation, and is part of the broader AfriqueLLM suite.

Loading preview...

AfriqueLlama-8B: African Language Adaptation of Llama 3.1

McGill-NLP's AfriqueLlama-8B is an 8 billion parameter causal language model built upon the Meta Llama 3.1 8B base. It is a core component of the AfriqueLLM suite, designed to significantly enhance performance across 20 African languages through extensive continued pre-training (CPT) on approximately 26 billion tokens of curated multilingual data. The model maintains a substantial native context length of 128,000 tokens.

Key Capabilities

  • Multilingual Proficiency: Adapted for 20 specific African languages including Swahili, Hausa, Yoruba, Amharic, and Zulu, alongside strong performance in high-resource languages like English, French, Portuguese, and Arabic.
  • Robust Training: Utilizes a diverse training corpus comprising African monolingual data (22.8B tokens), code (1B tokens from CornStack-Python), mathematics (1B tokens from FineMath-4+), and synthetic data (324M tokens).
  • Performance Improvement: Demonstrates a significant performance uplift of +14.1% (39.9%) on a suite of African language benchmarks (AfriMGSM, AfriMMLU, AfriXNLI, Belebele, FLORES, INJONG, SIB-200) compared to its base Llama 3.1-8B model.
  • Optimized Training: Benefits from advanced training techniques including UniMax sampling for balanced data distribution, BF16 mixed precision, and infrastructure leveraging 64 NVIDIA H100 GPUs with DeepSpeed and Flash Attention 3.

Good For

  • Applications requiring strong language understanding and generation in African languages.
  • Developers building multilingual systems targeting African linguistic contexts.
  • Research and development in low-resource language NLP, particularly for African languages.