lightonai/Qwen3-8B-DE
The lightonai/Qwen3-8B-DE is an 8 billion parameter Qwen3-based causal language model developed by lightonai, specifically fine-tuned for native German reasoning. It generates its entire reasoning trace and final answer in German, making it specialized for complex German-language tasks. With a 32,768 token context length, this model is optimized for applications requiring deep understanding and generation of German text.
Loading preview...
Overview
lightonai/Qwen3-8B-DE is an 8 billion parameter model from the Qwen3 family, developed by lightonai. It is uniquely fine-tuned to perform native reasoning in German, meaning it produces its entire chain-of-thought (CoT) and final answer in German. This model was released in conjunction with the paper "Rethinking the Multilingual Reasoning Gap with Layer Swap" and is part of a specialized German trio designed to investigate multilingual reasoning.
Key Capabilities & Features
- Native German Reasoning: Excels at generating detailed reasoning processes and answers entirely in German.
- Base Model: Fine-tuned from
Qwen/Qwen3-8B-Base. - Context Length: Supports a substantial context window of 32,768 tokens.
- Training: Underwent full Supervised Fine-Tuning (SFT) over approximately 10 billion tokens across 2 epochs, utilizing the German split of the
lightonai/Dolci-Think-SFT-32B-Multilingualdataset.
Performance Highlights
Evaluated on German versions of various benchmarks, Qwen3-8B-DE demonstrates strong performance, achieving an average accuracy of 72.59% across MGSM-Rev2, Global-MMLU-Lite, GPQA-Diamond, AIME 24/25, and HumanEvalPlus. While other related models in the trio (like Qwen3-8B-DE-Pivot-EN) show higher average scores by pivoting to English CoT, Qwen3-8B-DE stands out for its dedicated native German reasoning capability.
Good for
- Applications requiring complex problem-solving and reasoning in German.
- Use cases where the entire thought process, not just the final answer, needs to be in German.
- Research into multilingual reasoning and the impact of native language CoT generation.