McGill-NLP/AfriqueQwen-4B
McGill-NLP/AfriqueQwen-4B is a 4 billion parameter causal language model from the AfriqueLLM suite, adapted from Qwen/Qwen3-4B-Base. It is specifically optimized for 20 African languages through continued pre-training on approximately 26 billion tokens, while maintaining strong performance in high-resource languages. With a native context length of 32,768 tokens, this model excels in multilingual tasks, particularly those involving African languages, and demonstrates improved capabilities in long-context tasks like document-level translation.
Loading preview...
AfriqueQwen-4B: African Language Optimized Model
AfriqueQwen-4B is a 4 billion parameter causal language model developed by McGill-NLP as part of the AfriqueLLM suite. It is built upon the Qwen 3 4B base model and has undergone continued pre-training (CPT) on approximately 26 billion tokens of multilingual data, specifically targeting 20 African languages. This adaptation significantly enhances its performance on these languages while preserving capabilities in high-resource languages like English, French, Portuguese, and Arabic.
Key Capabilities
- Multilingual Proficiency: Adapted for 20 African languages including Swahili, Hausa, Yoruba, and Amharic, alongside strong performance in high-resource languages.
- Robust Base: Leverages the Qwen 3 4B architecture, known for its strong foundational performance.
- Extended Context: Features a native context length of 32,768 tokens, supporting long-context tasks such as document-level translation.
- Comprehensive Training Data: Trained on a diverse corpus including African monolingual data (22.8B tokens), code (1B tokens), mathematics (~1B tokens), and synthetic data, balanced using UniMax sampling.
Good for
- African Language Applications: Ideal for tasks requiring understanding and generation in the 20 supported African languages.
- Multilingual Research: Useful for researchers exploring continued pre-training and language adaptation for low-resource languages.
- Long-Context Tasks: Suitable for applications benefiting from extended context windows, such as document analysis and translation.