lichangh20/qwen3-4b-instruct-sft-swegym-iter2
The Qwen3-4B-Instruct-2507 model by Qwen is a 4.0 billion parameter instruction-tuned causal language model, part of the Qwen3 series. It features a native context length of 262,144 tokens and is specifically designed for non-thinking mode operations, excelling in instruction following, logical reasoning, and long-tail knowledge coverage across multiple languages. This model demonstrates significant improvements in general capabilities, including mathematics, science, coding, and tool usage, making it suitable for a wide range of complex generative AI applications.
Loading preview...
Overview
Qwen3-4B-Instruct-2507 is an updated 4.0 billion parameter instruction-tuned causal language model from the Qwen3 series, designed for "non-thinking mode" operations. This iteration, building on the Qwen3-4B foundation, focuses on enhanced general capabilities and user alignment. It features a substantial native context length of 262,144 tokens, making it adept at processing and understanding extensive inputs.
Key Capabilities
- General Instruction Following: Significant improvements across instruction following, logical reasoning, and text comprehension.
- Multilingual Knowledge: Substantial gains in long-tail knowledge coverage across various languages.
- Mathematical & Scientific Reasoning: Enhanced performance in mathematics and science tasks.
- Coding & Tool Usage: Improved capabilities in code generation and effective tool utilization.
- Long-Context Understanding: Markedly better performance in understanding and processing long contexts up to 256K tokens.
- User Alignment: Better alignment with user preferences for subjective and open-ended tasks, leading to more helpful and higher-quality text generation.
Performance Highlights
The model shows strong performance across various benchmarks, often outperforming its predecessor, Qwen3-4B Non-Thinking, and in some cases, larger models. Notable improvements include:
- Knowledge: Achieves 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
- Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
- Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
- Alignment: Demonstrates 83.4 on IFEval and 83.5 on Creative Writing v3.
- Agentic Use: Excels in tool calling, with strong results on BFCL-v3 (61.9) and TAU1-Retail (48.7).
Good For
- Applications requiring robust instruction following and logical reasoning.
- Tasks benefiting from extensive long-context understanding.
- Multilingual content generation and knowledge retrieval.
- Coding assistance and tool-use scenarios, especially with Qwen-Agent integration.
- Generating high-quality, aligned responses for subjective and open-ended prompts.