McGill-NLP/AfriqueLlama-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 6, 2026License:cc-by-4.0Architecture:Transformer Open Weights Warm

AfriqueLlama-8B by McGill-NLP is an 8 billion parameter causal language model, part of the AfriqueLLM suite, adapted from Llama 3.1 8B. It is specifically continued pre-trained on 29.6 billion tokens across 20 African languages, alongside high-resource languages, to enhance performance in low-resource linguistic contexts. This model maintains strong capabilities in high-resource languages while significantly improving performance on African language benchmarks, making it suitable for multilingual applications focusing on African languages.

Loading preview...

AfriqueLlama-8B: Multilingual Model for African Languages

AfriqueLlama-8B, developed by McGill-NLP, is an 8 billion parameter causal language model built upon Meta's Llama 3.1 8B architecture. It is a key component of the AfriqueLLM suite, which focuses on adapting open language models for improved performance across 20 African languages through continued pre-training (CPT).

Key Capabilities and Features

  • Multilingual Adaptation: Specifically adapted for 20 African languages (e.g., Swahili, Hausa, Yoruba, Amharic) while retaining proficiency in high-resource languages like English, French, Portuguese, and Arabic.
  • Extensive Continued Pre-training: Underwent CPT on 29.6 billion tokens of curated multilingual data, including African monolingual data, code (CornStack-Python), mathematics (FineMath-4+), and GPT-4.1 translated synthetic data.
  • Balanced Data Distribution: Utilizes UniMax sampling to balance data, capping high-resource languages and upsampling lower-resource ones for effective learning.
  • Performance Improvement: Demonstrates a significant performance uplift of +14.7 points (42.2%) on a suite of multilingual benchmarks compared to its base Llama 3.1 8B model, particularly excelling in tasks like FLORES and INJONG.
  • Context Length: Features a native context length of 8,192 tokens, extendable with RoPE scaling.

Good For

  • Applications requiring strong performance in African languages: Ideal for tasks such as text generation, translation, and understanding in low-resource African linguistic contexts.
  • Multilingual systems: Suitable for developers building applications that need to operate across a diverse set of languages, including both African and major global languages.
  • Research and development: Provides a robust base model for further fine-tuning or research into African language NLP.