mihai-777/evolai-tfm-1p5b-05

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

mihai-777/evolai-tfm-1p5b-05 is a 1.7 billion parameter causal language model, based on the Qwen3 architecture, developed by Qwen. This pre-trained base model features an expanded 36 trillion token corpus across 119 languages and a 32,768 token context length. It incorporates advanced training techniques and architectural refinements, making it suitable for broad language modeling and general knowledge acquisition.

Loading preview...

Qwen3-1.7B-Base Overview

mihai-777/evolai-tfm-1p5b-05 is a 1.7 billion parameter pre-trained causal language model from the Qwen3 series, developed by Qwen. This model builds upon significant advancements in training data, architecture, and optimization techniques, offering improved stability and performance over previous iterations.

Key Features & Improvements

  • Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, significantly tripling the language coverage of Qwen2.5. The dataset includes a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
  • Advanced Training Techniques: Incorporates architectural refinements like qk layernorm and a three-stage pre-training process. This process focuses on broad language modeling, then reasoning skills (STEM, coding), and finally long-context comprehension up to 32k tokens.
  • Optimized Architecture: Features 28 layers, 16 attention heads for Q, and 8 for KV, with a non-embedding parameter count of 1.4 billion.
  • Scaling Law Guided Tuning: Hyperparameters were systematically tuned using comprehensive scaling law studies to optimize training dynamics and performance across different model scales.

Use Cases

This base model is designed for broad language modeling and general knowledge acquisition, serving as a foundation for further fine-tuning or specific applications requiring robust language understanding and generation capabilities across multiple languages and domains.