clear-blue-sky/evolai-tfm-002

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 7, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model developed by the Qwen Team, part of the Qwen3 series. It is pre-trained on 36 trillion tokens across 119 languages, incorporating advanced training techniques like global-batch load balancing and qk layernorm. This model excels in broad language modeling, general knowledge acquisition, and reasoning skills, supporting a context length of 32,768 tokens.

Loading preview...

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen3 series, developed by the Qwen Team. This model builds upon previous Qwen iterations with significant advancements in its pre-training corpus and architectural refinements.

Key Capabilities and Features

  • Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, tripling the language coverage of Qwen2.5. The dataset includes a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
  • Advanced Training Techniques: Incorporates architectural refinements such as global-batch load balancing loss for MoE models and qk layernorm for all models, enhancing stability and overall performance.
  • Three-stage Pre-training: The training process is structured in three stages: initial broad language modeling, followed by improved reasoning skills (STEM, coding, logical reasoning), and finally enhanced long-context comprehension up to 32,768 tokens.
  • Optimized Hyperparameter Tuning: Utilizes scaling law studies to systematically tune hyperparameters for dense and MoE models, leading to better training dynamics.

Model Specifications

  • Parameters: 1.7 billion (1.4 billion non-embedding)
  • Layers: 28
  • Attention Heads (GQA): 16 for Q, 8 for KV
  • Context Length: 32,768 tokens

Further Information

For detailed evaluation results, hardware requirements, and inference performance, refer to the official Qwen3 blog and GitHub repository.