unsloth/Qwen3-235B-A22B

TEXT GENERATIONConcurrency Cost:4Model Size:235BQuant:FP8Ctx Length:32kPublished:May 9, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Qwen3-235B-A22B is a 235 billion parameter Mixture-of-Experts (MoE) causal language model developed by Qwen, with 22 billion activated parameters and a native context length of 32,768 tokens. It uniquely supports seamless switching between a 'thinking mode' for complex reasoning, math, and coding, and a 'non-thinking mode' for efficient general dialogue. This model excels in reasoning capabilities, human preference alignment for creative writing and role-playing, and agentic tasks with external tool integration, supporting over 100 languages.

Loading preview...

Qwen3-235B-A22B Overview

Qwen3-235B-A22B is a 235 billion parameter Mixture-of-Experts (MoE) causal language model from the Qwen series, featuring 22 billion activated parameters and a native context length of 32,768 tokens, extendable to 131,072 tokens with YaRN scaling. Developed by Qwen, this model introduces a unique capability to seamlessly switch between a 'thinking mode' for complex logical reasoning, mathematics, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue.

Key Capabilities

  • Dynamic Reasoning Modes: Offers distinct 'thinking' and 'non-thinking' modes, allowing for optimized performance across diverse tasks, from intricate problem-solving to general conversation.
  • Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning, outperforming previous Qwen models.
  • Superior Human Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging conversational experience.
  • Advanced Agentic Abilities: Integrates precisely with external tools in both thinking and non-thinking modes, achieving leading performance in complex agent-based tasks among open-source models.
  • Multilingual Support: Supports over 100 languages and dialects, offering strong capabilities for multilingual instruction following and translation.

Best Practices

To optimize performance, specific sampling parameters are recommended for each mode: Temperature=0.6, TopP=0.95, TopK=20 for thinking mode (avoid greedy decoding), and Temperature=0.7, TopP=0.8, TopK=20 for non-thinking mode. The model also supports dynamic switching of thinking modes via user input tags (/think, /no_think) in multi-turn conversations. For agentic use, integration with Qwen-Agent is recommended.