clear-blue-sky/evolai-tfm-003
The clear-blue-sky/evolai-tfm-003 model, developed by Qwen, is a 1.7 billion parameter causal language model from the Qwen3 series. Pre-trained on 36 trillion tokens across 119 languages, it features an expanded, high-quality corpus and architectural refinements like qk layernorm. This model is optimized for broad language modeling, general knowledge acquisition, and improved reasoning skills, supporting a context length of up to 32,768 tokens.
Loading preview...
Qwen3-1.7B-Base Overview
Qwen3-1.7B-Base is a 1.7 billion parameter causal language model, part of the Qwen3 series, which represents the latest generation of Qwen large language models. It builds upon significant advancements in training data, model architecture, and optimization techniques, offering improvements over previous Qwen versions.
Key Enhancements & Features
- Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, tripling the language coverage of Qwen2.5. The corpus includes a rich mix of high-quality data, such as coding, STEM, reasoning, book, multilingual, and synthetic data.
- Architectural Refinements: Incorporates advanced training techniques and architectural improvements, including global-batch load balancing loss for MoE models and qk layernorm for all models, enhancing stability and overall performance.
- Three-stage Pre-training: The training process is structured in three stages: initial broad language modeling and general knowledge acquisition, followed by improved reasoning skills (STEM, coding, logical reasoning), and finally, enhanced long-context comprehension by extending training sequence lengths up to 32,768 tokens.
- Context Length: Supports a substantial context length of 32,768 tokens.
Good For
- Applications requiring broad language understanding and generation across many languages.
- Tasks benefiting from strong reasoning capabilities, including STEM and coding-related problems.
- Use cases demanding long-context comprehension and processing.
For detailed evaluation results and further information, refer to the Qwen3 blog and GitHub repository.