confamnode/Qwen3-4B-Instruct-2507

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-4B-Instruct-2507 is a 4.0 billion parameter instruction-tuned causal language model developed by Qwen, an updated version of the Qwen3-4B non-thinking mode. It features significant improvements in general capabilities including instruction following, logical reasoning, and coding, alongside enhanced 256K long-context understanding. This model excels in subjective and open-ended tasks, providing more helpful responses and higher-quality text generation across multiple languages.

Loading preview...

Overview

Qwen3-4B-Instruct-2507 is an updated 4.0 billion parameter causal language model from Qwen, building upon the Qwen3-4B non-thinking mode. It is designed for both pretraining and post-training stages, featuring a native context length of 262,144 tokens. A key characteristic is its "non-thinking mode" operation, meaning it does not generate <think></think> blocks in its output, simplifying its use for direct instruction following.

Key Capabilities & Enhancements

  • General Performance: Demonstrates significant improvements across instruction following, logical reasoning, text comprehension, mathematics, science, and coding.
  • Long-Context Understanding: Features enhanced capabilities for processing and understanding inputs up to 256K tokens.
  • Multilingual Knowledge: Offers substantial gains in long-tail knowledge coverage across various languages.
  • User Alignment: Shows markedly better alignment with user preferences in subjective and open-ended tasks, leading to more helpful and higher-quality text generation.
  • Tool Usage & Agentic Abilities: Excels in tool calling, with recommended integration via Qwen-Agent for streamlined development.

Performance Highlights

Qwen3-4B-Instruct-2507 shows strong performance across various benchmarks, often outperforming its predecessor, Qwen3-4B Non-Thinking, and in some cases, larger models. Notable scores include:

  • Knowledge: Achieves 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
  • Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
  • Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
  • Alignment: Achieves 83.4 on IFEval and 83.5 on Creative Writing v3.
  • Agent: Scores 61.9 on BFCL-v3 and 48.7 on TAU1-Retail.

Best Practices

For optimal performance, Qwen recommends using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0. An output length of 16,384 tokens is suggested for most queries. Specific prompting strategies are advised for tasks like math problems and multiple-choice questions to standardize outputs.