clear-blue-sky/evolai-tfm-001

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model developed by Qwen, pre-trained on 36 trillion tokens across 119 languages. It features a three-stage pre-training process focusing on general knowledge, reasoning skills (STEM, coding), and long-context comprehension up to 32k tokens. This model incorporates architectural refinements like qk layernorm and is designed for broad language modeling and general knowledge acquisition.

Loading preview...

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen series, representing an advancement over Qwen2.5. It is pre-trained on an expanded, higher-quality corpus of 36 trillion tokens covering 119 languages, significantly increasing multilingual capabilities. The model integrates architectural refinements, including qk layernorm, to enhance stability and performance.

Key Training & Architectural Features

  • Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
  • Three-stage Pre-training: Progresses from broad language modeling to specialized reasoning skills (STEM, coding, logical reasoning) and finally to enhanced long-context comprehension, supporting sequence lengths up to 32,768 tokens.
  • Architectural Refinements: Incorporates training techniques and architectural improvements like qk layernorm for improved stability and performance.
  • Context Length: Supports a substantial context length of 32,768 tokens.

Good For

  • Applications requiring broad language understanding and generation across 119 languages.
  • Tasks benefiting from improved reasoning skills in STEM and coding.
  • Use cases demanding long-context comprehension.
  • Developers seeking a base model for further fine-tuning on specific tasks.