Overview
Qwen3-1.7B-Base Overview
Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen3 series, pre-trained on an expanded, high-quality corpus of 36 trillion tokens covering 119 languages. This represents a significant increase in language coverage and data richness compared to its predecessor, Qwen2.5, including coding, STEM, reasoning, and multilingual data.
Key Advancements
- Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a focus on high-quality data for diverse tasks.
- Architectural Refinements: Incorporates training techniques like global-batch load balancing loss (for MoE models) and qk layernorm for all models, enhancing stability and overall performance.
- Three-stage Pre-training: Progresses from general language modeling to specialized reasoning skills (STEM, coding) and finally to long-context comprehension, extending up to 32,768 tokens.
- Scaling Law Guided Tuning: Hyperparameters are systematically tuned across the pre-training pipeline for optimal training dynamics and performance.
Model Specifications
- Parameters: 1.7 billion (1.4 billion non-embedding)
- Layers: 28
- Attention Heads (GQA): 16 for Q, 8 for KV
- Context Length: 32,768 tokens
This model is suitable for applications requiring robust language understanding and generation, particularly benefiting from its extensive multilingual training and enhanced reasoning capabilities within a substantial context window.