Model Overview
McGill-NLP/AfriqueQwen-8B is an 8 billion parameter causal language model developed by McGill-NLP, built upon the Qwen3-8B-Base architecture. It is a key component of the AfriqueLLM suite, specifically designed to enhance performance across 20 African languages through continued pre-training (CPT) on 27.5 billion tokens. The model also retains strong capabilities in high-resource languages like English, French, Portuguese, and Arabic, mitigating catastrophic forgetting.
Key Capabilities
- Multilingual Adaptation: Specifically adapted for 20 African languages (e.g., Swahili, Amharic, Hausa, Yoruba) and 4 high-resource languages.
- Robust Base Model: Leverages the strong performance of Qwen 3 models, which have shown superior preservation of high-resource language capabilities post-CPT.
- Extended Context: Features a native context length of 32,768 tokens, supporting long-context tasks such as document-level translation.
- Diverse Training Data: Trained on a curated dataset including African monolingual data (FineWeb2, WURA, MADLAD-400), 1 billion code tokens (CornStack-Python), 1 billion mathematics tokens (FineMath-4+), and 324 million synthetic tokens.
- Performance Improvement: Demonstrates significant performance gains on African language benchmarks, with AfriqueQwen-8B showing a +26.1 point (78.8%) improvement over the base Qwen3-8B on the overall AfriqueLLM evaluation suite.
Good For
- Applications requiring strong language understanding and generation in African languages.
- Tasks benefiting from long context windows, such as document summarization or translation.
- Developers seeking a multilingual foundation model with a focus on under-resourced languages.