zhezi12138/Qwen3-4B_RL
Qwen3-4B-Instruct-2507 is a 4 billion parameter causal language model developed by Qwen, featuring a native context length of 262,144 tokens. This updated version of the Qwen3-4B non-thinking mode demonstrates significant improvements in instruction following, logical reasoning, mathematics, coding, and long-tail knowledge across multiple languages. It is specifically designed for enhanced alignment with user preferences in subjective and open-ended tasks, making it suitable for generating helpful and high-quality text.
Loading preview...
Qwen3-4B-Instruct-2507: Enhanced 4B Causal Language Model
Qwen3-4B-Instruct-2507 is an updated 4 billion parameter causal language model from Qwen, building upon the Qwen3-4B non-thinking mode. It features a substantial native context length of 262,144 tokens, making it highly capable for processing extensive inputs. This model is specifically designed to operate in a "non-thinking" mode, meaning it does not generate <think></think> blocks in its output, simplifying its use.
Key Capabilities and Enhancements
- General Capabilities: Demonstrates significant improvements across instruction following, logical reasoning, text comprehension, mathematics, science, and coding.
- Long-Tail Knowledge: Achieves substantial gains in knowledge coverage across multiple languages.
- User Alignment: Markedly better alignment with user preferences for subjective and open-ended tasks, leading to more helpful responses and higher-quality text generation.
- Long-Context Understanding: Enhanced capabilities in understanding and processing information within its 256K long context window.
- Tool Usage: Excels in tool calling, with recommendations to use Qwen-Agent for optimal agentic ability.
Performance Highlights
Benchmarking against other models, Qwen3-4B-Instruct-2507 shows strong performance:
- Knowledge: Achieves 69.6 on MMLU-Pro and 62.0 on GPQA, outperforming Qwen3-4B Non-Thinking and GPT-4.1-nano-2025-04-14.
- Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic, indicating significant improvements.
- Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
- Alignment: Scores 83.4 on IFEval and 83.5 on Creative Writing v3, showing strong user preference alignment.
Recommended Use Cases
This model is well-suited for applications requiring:
- Advanced instruction following and complex reasoning.
- High-quality text generation in open-ended scenarios.
- Processing and understanding very long documents or conversations.
- Multilingual applications and tasks requiring broad knowledge.
- Agentic workflows and tool-use integration.