lightonai/Qwen3-8B-ZH

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 10, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

lightonai/Qwen3-8B-ZH is an 8 billion parameter Qwen3-based language model developed by lightonai, specifically fine-tuned for native Chinese reasoning. It produces its entire reasoning trace and final answer in Chinese, making it a specialist for Chinese-language complex problem-solving. With a 32,768 token context length, this model is optimized for tasks requiring detailed Chinese chain-of-thought processing.

Loading preview...

Overview

lightonai/Qwen3-8B-ZH is an 8 billion parameter model based on Qwen/Qwen3-8B-Base, developed by lightonai. It is uniquely fine-tuned to perform native Chinese reasoning, generating its entire chain-of-thought (CoT) and final answer exclusively in Chinese. This model was released in conjunction with the paper "Rethinking the Multilingual Reasoning Gap with Layer Swap," investigating multilingual reasoning capabilities.

Key Capabilities

  • Native Chinese Reasoning: Specializes in generating detailed reasoning traces and answers in Chinese.
  • Extended Context Window: Supports a context length of 32,768 tokens, suitable for processing longer Chinese texts and complex problems.
  • Training Data: Fine-tuned on the Chinese split of the lightonai/Dolci-Think-SFT-32B-Multilingual dataset for approximately 10 billion tokens over 2 epochs.

Performance Highlights

Evaluated on Chinese versions of various benchmarks, Qwen3-8B-ZH achieves competitive scores, including 88.92% on MGSM-Rev2 and 74.85% on Global-MMLU-Lite. It is part of a trio of Chinese specialist models designed to study the multilingual reasoning gap, offering a direct comparison to English-centric or pivot-language approaches.

Good For

  • Applications requiring complex problem-solving and reasoning in Chinese.
  • Research into multilingual reasoning gaps and native language CoT generation.
  • Developers needing a robust 8B parameter model with strong Chinese language capabilities and a long context window.