Qwen3-4B-Thinking-2507: Enhanced Reasoning and Long-Context LLM

Qwen3-4B-Thinking-2507 is a 4.0 billion parameter causal language model from the Qwen3 series, specifically designed to excel in complex reasoning tasks. Developed by Qwen, this iteration significantly scales the model's "thinking capability," improving both the quality and depth of its reasoning processes.

Key Capabilities & Enhancements

Superior Reasoning: Demonstrates significantly improved performance across logical reasoning, mathematics, science, coding, and academic benchmarks that typically demand human expertise.
General Capability Boost: Shows markedly better instruction following, tool usage, text generation, and alignment with human preferences.
Extended Context Understanding: Features an enhanced native context length of 262,144 tokens, making it highly effective for long-context understanding.
Dedicated Thinking Mode: This version is exclusively designed for thinking mode, with enable_thinking=True no longer required. The default chat template automatically includes <think> tags to enforce this behavior.

Performance Highlights

Compared to its predecessor, Qwen3-4B-Thinking-2507 shows notable gains across various benchmarks:

Knowledge: Achieves 74.0 on MMLU-Pro and 86.1 on MMLU-Redux.
Reasoning: Scores 81.3 on AIME25 and 55.5 on HMMT25, indicating substantial improvements.
Coding: Reaches 55.2 on LiveCodeBench v6 and 1852 on CFEval.
Alignment: Shows strong performance with 87.4 on IFEval and 75.6 on Creative Writing v3.

Best Practices for Optimal Use

To maximize performance, users are recommended to use specific sampling parameters (e.g., Temperature=0.6, TopP=0.95) and ensure adequate output length, suggesting 32,768 tokens for most queries and up to 81,920 tokens for highly complex problems. The model also excels in tool calling, with recommendations to use Qwen-Agent for agentic applications.

Overview

Qwen3-4B-Thinking-2507: Enhanced Reasoning and Long-Context LLM

Key Capabilities & Enhancements

Performance Highlights

Best Practices for Optimal Use

Full Model Card (README)