Qwen3-4B-Thinking-2507 is a 4.0 billion parameter causal language model developed by Qwen, specifically enhanced for complex reasoning tasks. It features a substantial 262,144 token context length and significantly improved performance across logical reasoning, mathematics, science, and coding benchmarks. This model is optimized for scenarios requiring deep analytical thought and advanced problem-solving capabilities.
Loading preview...
Overview
Qwen3-4B-Thinking-2507 is a 4.0 billion parameter causal language model from Qwen, designed with a strong emphasis on thinking capability and complex reasoning. It builds upon previous Qwen3-4B versions, offering significant improvements in both the quality and depth of reasoning across various domains.
Key Enhancements & Capabilities
- Enhanced Reasoning: Demonstrates markedly improved performance on logical reasoning, mathematics, science, coding, and academic benchmarks requiring human-level expertise.
- General Capabilities: Features better instruction following, tool usage, text generation, and alignment with human preferences.
- Extended Context: Natively supports an impressive 262,144 token context length, making it suitable for highly complex reasoning tasks that demand extensive input.
- Dedicated Thinking Mode: This model operates exclusively in a "thinking mode," where the chat template automatically includes a
<think>tag to facilitate internal reasoning processes. - Agentic Use: Excels in tool-calling capabilities, with recommendations to use Qwen-Agent for streamlined integration.
Performance Highlights
Compared to its predecessor, Qwen3-4B-Thinking-2507 shows notable gains across various benchmarks:
- Reasoning: Achieves 81.3 on AIME25 and 55.5 on HMMT25, surpassing previous versions.
- Coding: Scores 55.2 on LiveCodeBench v6.
- Alignment: Reaches 87.4 on IFEval and 75.6 on Creative Writing v3.
- Agent: Shows significant improvements in BFCL-v3 and TAU benchmarks.
Recommended Use Cases
This model is particularly well-suited for applications requiring deep analytical processing, complex problem-solving, and advanced logical inference, especially where long context understanding is critical. It is recommended for highly complex reasoning tasks and agentic workflows.