Overview

Qwen3-4B-Thinking-2507 is a 4.0 billion parameter causal language model from Qwen, designed with a strong emphasis on thinking capability and complex reasoning. It builds upon previous Qwen3-4B versions, offering significant improvements in both the quality and depth of reasoning across various domains.

Key Enhancements & Capabilities

Enhanced Reasoning: Demonstrates markedly improved performance on logical reasoning, mathematics, science, coding, and academic benchmarks requiring human-level expertise.
General Capabilities: Features better instruction following, tool usage, text generation, and alignment with human preferences.
Extended Context: Natively supports an impressive 262,144 token context length, making it suitable for highly complex reasoning tasks that demand extensive input.
Dedicated Thinking Mode: This model operates exclusively in a "thinking mode," where the chat template automatically includes a <think> tag to facilitate internal reasoning processes.
Agentic Use: Excels in tool-calling capabilities, with recommendations to use Qwen-Agent for streamlined integration.

Performance Highlights

Compared to its predecessor, Qwen3-4B-Thinking-2507 shows notable gains across various benchmarks:

Reasoning: Achieves 81.3 on AIME25 and 55.5 on HMMT25, surpassing previous versions.
Coding: Scores 55.2 on LiveCodeBench v6.
Alignment: Reaches 87.4 on IFEval and 75.6 on Creative Writing v3.
Agent: Shows significant improvements in BFCL-v3 and TAU benchmarks.

Recommended Use Cases

This model is particularly well-suited for applications requiring deep analytical processing, complex problem-solving, and advanced logical inference, especially where long context understanding is critical. It is recommended for highly complex reasoning tasks and agentic workflows.

Overview

Overview

Key Enhancements & Capabilities

Performance Highlights

Recommended Use Cases

Full Model Card (README)