doublebean/Qwen3-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Apr 17, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-32B is a 32.8 billion parameter causal language model developed by Qwen, featuring a unique ability to seamlessly switch between a 'thinking mode' for complex reasoning and a 'non-thinking mode' for general dialogue. This model significantly enhances reasoning capabilities in mathematics, code generation, and logical reasoning, while also excelling in human preference alignment for creative writing and multi-turn conversations. It supports over 100 languages and dialects and offers advanced agent capabilities for tool integration, with a native context length of 32,768 tokens, extendable to 131,072 tokens using YaRN.

Loading preview...

Qwen3-32B: A Dual-Mode Language Model

Qwen3-32B is a 32.8 billion parameter causal language model from the Qwen series, designed with a novel architecture that allows for dynamic switching between two operational modes: a 'thinking mode' and a 'non-thinking mode'. This unique feature enables the model to optimize performance across diverse tasks, from complex logical reasoning to efficient general-purpose dialogue.

Key Capabilities & Differentiators

  • Adaptive Reasoning: Uniquely supports seamless switching between a dedicated 'thinking mode' for complex logical reasoning, mathematics, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue, ensuring optimal performance.
  • Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning, outperforming previous Qwen models.
  • Superior Human Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging conversational experience.
  • Advanced Agentic Abilities: Offers robust tool-calling capabilities, achieving leading performance among open-source models in complex agent-based tasks, particularly when integrated with frameworks like Qwen-Agent.
  • Multilingual Support: Capable of processing and generating content in over 100 languages and dialects, with strong multilingual instruction following and translation capabilities.
  • Extended Context Window: Natively supports a context length of 32,768 tokens, which can be extended up to 131,072 tokens using the YaRN scaling method for processing long texts.

Recommended Usage

For optimal performance, specific sampling parameters are recommended for each mode: Temperature=0.6, TopP=0.95, TopK=20 for thinking mode, and Temperature=0.7, TopP=0.8, TopK=20 for non-thinking mode. The model is particularly well-suited for applications requiring flexible reasoning, advanced conversational AI, and complex agent workflows.