Qwen3-235B-A22B Overview

Qwen3-235B-A22B is a 235 billion parameter Mixture-of-Experts (MoE) causal language model from the Qwen series, featuring 22 billion activated parameters and a native context length of 32,768 tokens, extendable to 131,072 tokens with YaRN scaling. Developed by Qwen, this model introduces a unique capability to seamlessly switch between a 'thinking mode' for complex logical reasoning, mathematics, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue.

Key Capabilities

Dynamic Reasoning Modes: Offers distinct 'thinking' and 'non-thinking' modes, allowing for optimized performance across diverse tasks, from intricate problem-solving to general conversation.
Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning, outperforming previous Qwen models.
Superior Human Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging conversational experience.
Advanced Agentic Abilities: Integrates precisely with external tools in both thinking and non-thinking modes, achieving leading performance in complex agent-based tasks among open-source models.
Multilingual Support: Supports over 100 languages and dialects, offering strong capabilities for multilingual instruction following and translation.

Best Practices

To optimize performance, specific sampling parameters are recommended for each mode: Temperature=0.6, TopP=0.95, TopK=20 for thinking mode (avoid greedy decoding), and Temperature=0.7, TopP=0.8, TopK=20 for non-thinking mode. The model also supports dynamic switching of thinking modes via user input tags (/think, /no_think) in multi-turn conversations. For agentic use, integration with Qwen-Agent is recommended.

Overview

Qwen3-235B-A22B Overview

Key Capabilities

Best Practices

Full Model Card (README)