dawndaa/Qwen3-4B-Base
Qwen3-4B-Base is a 4.0 billion parameter causal language model developed by Qwen, pre-trained on 36 trillion tokens across 119 languages with a 32,768 token context length. This base model incorporates advanced training techniques like global-batch load balancing and qk layernorm, and utilizes a three-stage pre-training process to enhance broad language modeling, reasoning skills, and long-context comprehension. It is designed as a foundational model within the Qwen3 series, offering improved stability and performance over previous iterations.
Loading preview...
Qwen3-4B-Base Overview
Qwen3-4B-Base is a 4.0 billion parameter causal language model from the Qwen3 series, pre-trained on an expanded corpus of 36 trillion tokens covering 119 languages. This model significantly improves upon Qwen2.5 by tripling language coverage and enriching data quality, including coding, STEM, reasoning, and multilingual content. It features a substantial context length of 32,768 tokens, making it suitable for tasks requiring extensive contextual understanding.
Key Advancements & Capabilities
- Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, with a focus on high-quality data for diverse applications.
- Advanced Training Techniques: Incorporates architectural refinements like global-batch load balancing for MoE models and qk layernorm for all models, enhancing stability and performance.
- Three-stage Pre-training: A structured approach that first builds general language understanding, then refines reasoning skills (STEM, coding), and finally extends long-context comprehension.
- Scaling Law Guided Tuning: Critical hyperparameters are systematically tuned across pre-training stages for optimal performance across different model scales.
Model Specifications
- Type: Causal Language Model
- Parameters: 4.0 Billion (3.6B non-embedding)
- Layers: 36
- Attention Heads (GQA): 32 for Q, 8 for KV
- Context Length: 32,768 tokens
For detailed evaluation results and further technical information, refer to the official Qwen3 blog and GitHub repository.