mihai-777/evolai-tfm-1p5b-v5

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 24, 2026Architecture:Transformer Cold

mihai-777/evolai-tfm-1p5b-v5 is a 1.7 billion parameter causal language model from the Qwen3 series, pre-trained on 36 trillion tokens across 119 languages. It features an expanded, high-quality pre-training corpus with a rich mix of coding, STEM, reasoning, and multilingual data. The model incorporates architectural refinements like qk layernorm and a three-stage pre-training process, making it suitable for broad language modeling, general knowledge acquisition, and improved reasoning tasks with a 32,768 token context length.

Loading preview...

Qwen3-1.7B-Base Overview

mihai-777/evolai-tfm-1p5b-v5 is a 1.7 billion parameter causal language model, part of the Qwen3 series. This model builds upon the Qwen2.5 generation with significant advancements in its training data and architectural design. It was pre-trained on an extensive corpus of 36 trillion tokens covering 119 languages, featuring a diverse mix of high-quality data including coding, STEM, reasoning, and multilingual content.

Key Improvements & Features

  • Expanded Pre-training Corpus: Utilizes a significantly larger and higher-quality dataset, tripling language coverage compared to Qwen2.5.
  • Architectural Refinements: Incorporates advanced training techniques and architectural improvements, such as qk layernorm, enhancing stability and performance.
  • Three-stage Pre-training: The training process is divided into three stages: initial broad language modeling, followed by a focus on reasoning skills (STEM, coding), and finally, long-context comprehension up to 32,768 tokens.
  • Optimized Hyperparameter Tuning: Leverages comprehensive scaling law studies to systematically tune hyperparameters for improved training dynamics.
  • Context Length: Supports a substantial context window of 32,768 tokens.

Use Cases

This model is well-suited for applications requiring:

  • General language understanding and generation.
  • Tasks involving STEM, coding, and logical reasoning.
  • Processing and understanding long-form text due to its extended context length.
  • Multilingual applications across 119 languages.