dnotitia/Qwen3-4B-Instruct-2507

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Oct 31, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

dnotitia/Qwen3-4B-Instruct-2507 is a 4 billion parameter instruction-tuned causal language model, based on the Qwen3 architecture and patched by Dnotitia for improved training compatibility. This updated version, Qwen3-4B-Instruct-2507, significantly enhances general capabilities including instruction following, logical reasoning, mathematics, coding, and tool usage, while also offering enhanced 256K long-context understanding. It is designed for efficient training experiments and excels in subjective and open-ended tasks, providing more helpful and higher-quality text generation.

Loading preview...

Qwen3-4B-Instruct-2507: Enhanced Instruction-Tuned Model

dnotitia/Qwen3-4B-Instruct-2507 is an updated 4 billion parameter instruction-tuned causal language model, part of the Qwen3 series. This specific version, patched by Dnotitia, maintains the original Qwen3 weights but features a refactored chat template (chat_template.jinja) and injected {% generation %} tags for better compatibility with the trl library's assistant_only_loss.

Key Enhancements & Capabilities

This model, a non-thinking mode variant, demonstrates significant improvements across various domains:

  • General Capabilities: Substantial gains in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
  • Long-Context Understanding: Enhanced capabilities with a native context length of 262,144 tokens.
  • Multilingualism: Improved long-tail knowledge coverage across multiple languages.
  • User Alignment: Markedly better alignment with user preferences in subjective and open-ended tasks, leading to more helpful responses and higher-quality text generation.
  • Agentic Use: Excels in tool calling, with recommendations to use Qwen-Agent for streamlined integration.

Performance Highlights

Benchmarks show Qwen3-4B-Instruct-2507 outperforming its predecessor, Qwen3-4B Non-Thinking, and often competing with or surpassing larger models in its class across various metrics:

  • Knowledge: Achieves 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
  • Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
  • Coding: Reaches 76.8 on MultiPL-E and 35.1 on LiveCodeBench v6.
  • Alignment: Scores 83.5 on Creative Writing v3 and 83.4 on WritingBench.
  • Agent: Achieves 61.9 on BFCL-v3 and 48.7 on TAU1-Retail.

Best Practices

For optimal performance, recommended sampling parameters include Temperature=0.7, TopP=0.8, TopK=20, and MinP=0. An output length of 16,384 tokens is suggested for most queries. Specific prompting strategies are advised for math problems and multiple-choice questions to standardize output formats.