McGill-NLP/AfriqueQwen-14B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Jan 7, 2026License:cc-by-4.0Architecture:Transformer0.0K Open Weights Warm

McGill-NLP/AfriqueQwen-14B is a 14 billion parameter causal language model from the AfriqueLLM suite, based on Qwen3-14B-Base. It has been specifically adapted through continued pre-training on ~26 billion tokens for improved performance across 20 African languages, while maintaining strong capabilities in high-resource languages. This model excels in multilingual contexts, particularly for African languages, and supports a native context length of 32,768 tokens. It is optimized for tasks requiring understanding and generation in diverse linguistic environments, including document-level translation.

Loading preview...

AfriqueQwen-14B: Multilingual LLM for African Languages

AfriqueQwen-14B is a 14 billion parameter causal language model developed by McGill-NLP as part of the AfriqueLLM suite. Built upon the Qwen3-14B-Base architecture, this model has undergone extensive continued pre-training (CPT) on approximately 26 billion tokens of curated multilingual data, specifically targeting 20 African languages alongside four high-resource languages (English, French, Portuguese, Arabic).

Key Capabilities

  • Multilingual Proficiency: Significantly enhanced performance across 20 African languages, including Swahili, Hausa, Yoruba, Amharic, and Zulu, while preserving strong capabilities in high-resource languages.
  • Robust Base Model: Leverages the Qwen 3 architecture, which demonstrated superior performance preservation and strong results on long-context tasks like document-level translation during base model evaluation.
  • Extensive Context Window: Features a native context length of 32,768 tokens, enabling processing of lengthy documents and complex queries.
  • Diverse Training Data: Trained on a balanced corpus including ~22.8B tokens of African monolingual data (FineWeb2, WURA, MADLAD-400), ~1B tokens of code (CornStack-Python), ~1B tokens of mathematics (FineMath-4+), and ~324M tokens of GPT-4.1 translated synthetic data.
  • Performance Improvement: Achieves a substantial +23.3% (57.8%) overall improvement on a suite of African language benchmarks (AfriMGSM, AfriMMLU, AfriXNLI, Belebele, FLORES, INJONG, SIB-200) compared to its base model, Qwen3-14B.

Good for

  • Applications requiring high-quality language understanding and generation in African languages.
  • Multilingual tasks such as translation, summarization, and content creation across diverse linguistic contexts.
  • Research and development focusing on low-resource language processing and cross-lingual transfer learning.
  • Use cases benefiting from a large context window for processing long documents or complex conversational flows.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p