clear-blue-sky/evolai-tfm-001
Qwen3-1.7B-Base is a 1.7 billion parameter causal language model developed by Qwen, pre-trained on 36 trillion tokens across 119 languages. It features a three-stage pre-training process focusing on general knowledge, reasoning skills (STEM, coding), and long-context comprehension up to 32k tokens. This model incorporates architectural refinements like qk layernorm and is designed for broad language modeling and general knowledge acquisition.
Loading preview...
Qwen3-1.7B-Base Overview
Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen series, representing an advancement over Qwen2.5. It is pre-trained on an expanded, higher-quality corpus of 36 trillion tokens covering 119 languages, significantly increasing multilingual capabilities. The model integrates architectural refinements, including qk layernorm, to enhance stability and performance.
Key Training & Architectural Features
- Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
- Three-stage Pre-training: Progresses from broad language modeling to specialized reasoning skills (STEM, coding, logical reasoning) and finally to enhanced long-context comprehension, supporting sequence lengths up to 32,768 tokens.
- Architectural Refinements: Incorporates training techniques and architectural improvements like qk layernorm for improved stability and performance.
- Context Length: Supports a substantial context length of 32,768 tokens.
Good For
- Applications requiring broad language understanding and generation across 119 languages.
- Tasks benefiting from improved reasoning skills in STEM and coding.
- Use cases demanding long-context comprehension.
- Developers seeking a base model for further fine-tuning on specific tasks.