Qwen3-4B-Base Overview

Qwen3-4B-Base is a 4.0 billion parameter causal language model from the Qwen3 series, pre-trained on an expanded corpus of 36 trillion tokens covering 119 languages. This model significantly improves upon Qwen2.5 by tripling language coverage and enriching data quality, including coding, STEM, reasoning, and multilingual content. It features a substantial context length of 32,768 tokens, making it suitable for tasks requiring extensive contextual understanding.

Key Advancements & Capabilities

Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, with a focus on high-quality data for diverse applications.
Advanced Training Techniques: Incorporates architectural refinements like global-batch load balancing for MoE models and qk layernorm for all models, enhancing stability and performance.
Three-stage Pre-training: A structured approach that first builds general language understanding, then refines reasoning skills (STEM, coding), and finally extends long-context comprehension.
Scaling Law Guided Tuning: Critical hyperparameters are systematically tuned across pre-training stages for optimal performance across different model scales.

Model Specifications

Type: Causal Language Model
Parameters: 4.0 Billion (3.6B non-embedding)
Layers: 36
Attention Heads (GQA): 32 for Q, 8 for KV
Context Length: 32,768 tokens

For detailed evaluation results and further technical information, refer to the official Qwen3 blog and GitHub repository.

Overview

Qwen3-4B-Base Overview

Key Advancements & Capabilities

Model Specifications

Full Model Card (README)