lightonai/Qwen3-8B-SW

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 12, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

lightonai/Qwen3-8B-SW is an 8 billion parameter native reasoning model, fine-tuned from Qwen/Qwen3-8B-Base, specifically designed to perform reasoning tasks entirely in Swahili. It generates its complete chain-of-thought and final answer in Swahili, leveraging a 32,768 token context length. This model is optimized for Swahili-language problem-solving and is part of a research effort to study multilingual reasoning gaps.

Loading preview...

Overview

lightonai/Qwen3-8B-SW is an 8 billion parameter language model developed by lightonai, fine-tuned from Qwen/Qwen3-8B-Base. Its primary distinction is its capability to perform native reasoning in Swahili, generating both its entire chain-of-thought (CoT) and the final answer in Swahili. This model was developed as part of the research presented in the paper "Rethinking the Multilingual Reasoning Gap with Layer Swap" and is trained on approximately 10 billion tokens over 2 epochs using the Swahili split of the lightonai/Dolci-Think-SFT-32B-Multilingual dataset.

Key Capabilities

  • Swahili Native Reasoning: Produces detailed reasoning traces and answers exclusively in Swahili.
  • Extended Context Window: Supports a context length of 32,768 tokens, allowing for processing longer Swahili texts and complex problems.
  • Specialized Multilingual Research: Forms part of a trio of Swahili specialist models designed to investigate the multilingual reasoning gap, alongside Qwen3-8B-SW-Swap and Qwen3-8B-SW-Pivot-EN.

Performance

Evaluated on Swahili versions of various benchmarks, Qwen3-8B-SW demonstrates strong performance in Swahili reasoning tasks. For instance, it achieves 93.16% on MGSM-Rev2 and 82.69% on HumanEvalPlus in Swahili, with an average score of 66.98% across multiple benchmarks including Global-MMLU-Lite, GPQA-Diamond, and AIME 24/25.

Good For

  • Swahili-centric AI applications: Ideal for use cases requiring deep understanding and generation of Swahili text, particularly for problem-solving and reasoning.
  • Research in multilingual LLMs: A valuable tool for researchers studying language-specific reasoning, chain-of-thought processes, and the multilingual reasoning gap.
  • Educational tools for Swahili speakers: Can be integrated into platforms that require explaining complex concepts or solving problems in Swahili.