mihai-777/evolai-tfm-1p5b-alt

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 22, 2026Architecture:Transformer Cold

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model developed by Qwen, pre-trained on 36 trillion tokens across 119 languages. This model incorporates architectural refinements and a three-stage pre-training process, focusing on broad language modeling, reasoning skills, and long-context comprehension up to 32,768 tokens. It is designed for general language tasks, leveraging an expanded, high-quality pre-training corpus including coding, STEM, and multilingual data.

Loading preview...

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen series, representing an advancement over Qwen2.5. It features a 32,768 token context length and is pre-trained on an extensive corpus of 36 trillion tokens across 119 languages, significantly expanding its multilingual capabilities and data quality compared to previous iterations.

Key Improvements and Features

  • Expanded Pre-training Corpus: Utilizes a richer mix of high-quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data, tripling the language coverage.
  • Architectural Refinements: Incorporates training techniques like global-batch load balancing loss for MoE models and qk layernorm for all models, enhancing stability and performance.
  • Three-stage Pre-training: A structured approach where Stage 1 focuses on general knowledge, Stage 2 on reasoning (STEM, coding, logical reasoning), and Stage 3 on long-context comprehension.
  • Scaling Law Guided Tuning: Critical hyperparameters are systematically tuned for dense and MoE models across the pre-training pipeline to optimize training dynamics and final performance.

Model Specifications

  • Type: Causal Language Model
  • Parameters: 1.7 billion (1.4 billion non-embedding)
  • Layers: 28
  • Attention Heads (GQA): 16 for Q, 8 for KV
  • Context Length: 32,768 tokens

For detailed evaluation results and further information, refer to the Qwen3 blog and GitHub repository.