mihai-777/evolai-tfm-1p5b
The mihai-777/evolai-tfm-1p5b model is a 1.7 billion parameter causal language model from the Qwen3 series, developed by Qwen. It is a base pre-trained model with a 32,768 token context length, built upon an expanded 36 trillion token corpus covering 119 languages. This model incorporates advanced training techniques and architectural refinements, including a three-stage pre-training process focused on broad language modeling, reasoning skills, and long-context comprehension, making it suitable for general language understanding and generation tasks.
Loading preview...
Qwen3-1.7B-Base Overview
This model, mihai-777/evolai-tfm-1p5b, is a 1.7 billion parameter causal language model from the Qwen3 series, developed by Qwen. It represents the latest generation of Qwen models, featuring significant advancements over its predecessors. The model is pre-trained on an extensive corpus of 36 trillion tokens across 119 languages, a substantial increase in linguistic coverage and data quality, including specialized data for coding, STEM, reasoning, and multilingual tasks.
Key Capabilities & Features
- Expanded Pre-training Corpus: Utilizes a 36 trillion token dataset covering 119 languages, enhancing its multilingual and domain-specific understanding.
- Advanced Training Techniques: Incorporates architectural refinements like global-batch load balancing loss for MoE models and qk layernorm for improved stability and performance.
- Three-stage Pre-training: Progresses from general language modeling to enhanced reasoning skills (STEM, coding) and finally to long-context comprehension, supporting up to 32,768 tokens.
- Optimized Hyperparameter Tuning: Benefits from scaling law studies to systematically tune hyperparameters for better training dynamics and performance across different model scales.
- Causal Language Model: Designed for sequential text generation and understanding.
Good For
- General language understanding and generation tasks.
- Applications requiring broad multilingual support.
- Tasks benefiting from extended context comprehension.
- Further fine-tuning for specific downstream applications.