lichangh20/qwen3-4b-instruct-sft-swegym-iter2

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 24, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The Qwen3-4B-Instruct-2507 model by Qwen is a 4.0 billion parameter instruction-tuned causal language model, part of the Qwen3 series. It features a native context length of 262,144 tokens and is specifically designed for non-thinking mode operations, excelling in instruction following, logical reasoning, and long-tail knowledge coverage across multiple languages. This model demonstrates significant improvements in general capabilities, including mathematics, science, coding, and tool usage, making it suitable for a wide range of complex generative AI applications.

Loading preview...

Overview

Qwen3-4B-Instruct-2507 is an updated 4.0 billion parameter instruction-tuned causal language model from the Qwen3 series, designed for "non-thinking mode" operations. This iteration, building on the Qwen3-4B foundation, focuses on enhanced general capabilities and user alignment. It features a substantial native context length of 262,144 tokens, making it adept at processing and understanding extensive inputs.

Key Capabilities

  • General Instruction Following: Significant improvements across instruction following, logical reasoning, and text comprehension.
  • Multilingual Knowledge: Substantial gains in long-tail knowledge coverage across various languages.
  • Mathematical & Scientific Reasoning: Enhanced performance in mathematics and science tasks.
  • Coding & Tool Usage: Improved capabilities in code generation and effective tool utilization.
  • Long-Context Understanding: Markedly better performance in understanding and processing long contexts up to 256K tokens.
  • User Alignment: Better alignment with user preferences for subjective and open-ended tasks, leading to more helpful and higher-quality text generation.

Performance Highlights

The model shows strong performance across various benchmarks, often outperforming its predecessor, Qwen3-4B Non-Thinking, and in some cases, larger models. Notable improvements include:

  • Knowledge: Achieves 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
  • Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
  • Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
  • Alignment: Demonstrates 83.4 on IFEval and 83.5 on Creative Writing v3.
  • Agentic Use: Excels in tool calling, with strong results on BFCL-v3 (61.9) and TAU1-Retail (48.7).

Good For

  • Applications requiring robust instruction following and logical reasoning.
  • Tasks benefiting from extensive long-context understanding.
  • Multilingual content generation and knowledge retrieval.
  • Coding assistance and tool-use scenarios, especially with Qwen-Agent integration.
  • Generating high-quality, aligned responses for subjective and open-ended prompts.