Qwen/Qwen3-4B-Instruct-2507

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Aug 5, 2025License:apache-2.0Architecture:Transformer0.8K Open Weights Warm

Qwen/Qwen3-4B-Instruct-2507 is a 4.0 billion parameter instruction-tuned causal language model developed by Qwen, featuring a native context length of 262,144 tokens. This updated version of the Qwen3-4B non-thinking mode demonstrates significant improvements across general capabilities including instruction following, logical reasoning, mathematics, coding, and long-tail knowledge coverage in multiple languages. It excels in subjective and open-ended tasks, providing helpful responses and high-quality text generation, making it suitable for diverse conversational AI and agentic applications.

Loading preview...

Qwen3-4B-Instruct-2507: Enhanced Instruction-Following LLM

Qwen3-4B-Instruct-2507 is an updated 4.0 billion parameter instruction-tuned causal language model from Qwen, building upon the Qwen3-4B non-thinking mode. It features a substantial native context length of 262,144 tokens, enabling advanced long-context understanding.

Key Capabilities and Enhancements

  • General Capabilities: Significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
  • Knowledge & Multilingualism: Substantial gains in long-tail knowledge coverage across multiple languages, as evidenced by strong performance on MMLU-ProX and PolyMATH benchmarks.
  • Alignment & Subjectivity: Markedly better alignment with user preferences in subjective and open-ended tasks, leading to more helpful responses and higher-quality text generation.
  • Agentic Use: Excels in tool-calling capabilities, with recommendations to use Qwen-Agent for optimal integration.

Performance Highlights

The model demonstrates strong performance across various benchmarks, often surpassing its predecessor, Qwen3-4B Non-Thinking, and in some cases, even larger models. Notable improvements include:

  • Knowledge: Achieves 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
  • Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
  • Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
  • Alignment: Scores 83.4 on IFEval and 83.5 on Creative Writing v3.

Recommended Use Cases

This model is particularly well-suited for applications requiring:

  • Advanced instruction following and complex reasoning.
  • Long-context understanding and generation.
  • High-quality, aligned responses in subjective and open-ended conversational scenarios.
  • Agentic workflows and tool-use integration.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p