mihai-777/evolai-tfm-1p5b

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026Architecture:Transformer Cold

The mihai-777/evolai-tfm-1p5b model is a 1.7 billion parameter causal language model from the Qwen3 series, developed by Qwen. It is a base pre-trained model with a 32,768 token context length, built upon an expanded 36 trillion token corpus covering 119 languages. This model incorporates advanced training techniques and architectural refinements, including a three-stage pre-training process focused on broad language modeling, reasoning skills, and long-context comprehension, making it suitable for general language understanding and generation tasks.

Loading preview...

Qwen3-1.7B-Base Overview

This model, mihai-777/evolai-tfm-1p5b, is a 1.7 billion parameter causal language model from the Qwen3 series, developed by Qwen. It represents the latest generation of Qwen models, featuring significant advancements over its predecessors. The model is pre-trained on an extensive corpus of 36 trillion tokens across 119 languages, a substantial increase in linguistic coverage and data quality, including specialized data for coding, STEM, reasoning, and multilingual tasks.

Key Capabilities & Features

  • Expanded Pre-training Corpus: Utilizes a 36 trillion token dataset covering 119 languages, enhancing its multilingual and domain-specific understanding.
  • Advanced Training Techniques: Incorporates architectural refinements like global-batch load balancing loss for MoE models and qk layernorm for improved stability and performance.
  • Three-stage Pre-training: Progresses from general language modeling to enhanced reasoning skills (STEM, coding) and finally to long-context comprehension, supporting up to 32,768 tokens.
  • Optimized Hyperparameter Tuning: Benefits from scaling law studies to systematically tune hyperparameters for better training dynamics and performance across different model scales.
  • Causal Language Model: Designed for sequential text generation and understanding.

Good For

  • General language understanding and generation tasks.
  • Applications requiring broad multilingual support.
  • Tasks benefiting from extended context comprehension.
  • Further fine-tuning for specific downstream applications.