clear-blue-sky/evolai-tfm-004
clear-blue-sky/evolai-tfm-004 is a 1.7 billion parameter causal language model from the Qwen3 series, developed by Qwen. Pre-trained on 36 trillion tokens across 119 languages, it features an expanded, higher-quality corpus and architectural refinements like qk layernorm. This model excels in broad language modeling, general knowledge acquisition, and reasoning skills, with a context length of 32,768 tokens.
Loading preview...
Qwen3-1.7B-Base Overview
clear-blue-sky/evolai-tfm-004 is a 1.7 billion parameter base causal language model from the Qwen3 series, developed by Qwen. It builds upon previous Qwen models with significant advancements in training data, architecture, and optimization. The model was pre-trained on an expanded corpus of 36 trillion tokens covering 119 languages, tripling the language coverage of its predecessor, Qwen2.5. This dataset includes a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
Key Features & Improvements
- Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a focus on high-quality data for diverse tasks.
- Architectural Refinements: Incorporates training techniques like global-batch load balancing for MoE models and qk layernorm for improved stability and performance.
- Three-stage Pre-training: Progresses from general knowledge acquisition to enhanced reasoning skills (STEM, coding) and finally to improved long-context comprehension, supporting up to 32k tokens.
- Systematic Hyperparameter Tuning: Employs scaling law studies to optimize hyperparameters across the pre-training stages for better training dynamics.
Model Specifications
- Parameters: 1.7 billion (1.4 billion non-embedding)
- Layers: 28
- Attention Heads (GQA): 16 for Q, 8 for KV
- Context Length: 32,768 tokens
For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.