Overview

Qwen3-4B is a 4 billion parameter causal language model from the Qwen series, designed for advanced reasoning, instruction-following, and agent capabilities. It introduces a novel feature allowing seamless switching between a 'thinking mode' for complex tasks like mathematics, code generation, and logical reasoning, and a 'non-thinking mode' for efficient, general-purpose dialogue. This dual-mode functionality aims to optimize performance across various use cases.

Key Capabilities

Adaptive Reasoning: Dynamically switches between a dedicated 'thinking mode' for complex problem-solving and a 'non-thinking mode' for general conversations, enhancing efficiency and accuracy.
Enhanced Reasoning: Demonstrates significant improvements in mathematical problem-solving, code generation, and commonsense logical reasoning compared to previous Qwen models.
Superior Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging user experience.
Agentic Expertise: Offers strong tool-calling capabilities, achieving leading performance among open-source models in complex agent-based tasks, especially when integrated with Qwen-Agent.
Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation abilities.
Extended Context: Natively handles context lengths up to 32,768 tokens, with support for up to 131,072 tokens using the YaRN method for processing long texts.

When to Use This Model

Complex Problem Solving: Ideal for applications requiring advanced logical reasoning, mathematical computations, or code generation, leveraging its 'thinking mode'.
General Conversational AI: Suitable for chatbots and dialogue systems where efficient, general-purpose responses are needed, utilizing its 'non-thinking mode'.
Agent-based Applications: Excellent for scenarios requiring precise integration with external tools and complex agent workflows.
Multilingual Applications: A strong candidate for applications needing robust performance across a wide array of languages and dialects.
Long Document Processing: Beneficial for tasks involving extensive text analysis or generation due to its extended context window capabilities.

Overview

Overview

Key Capabilities

When to Use This Model

Full Model Card (README)