clear-blue-sky/evolai-tfm-006

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 7, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model developed by Qwen, pre-trained on 36 trillion tokens across 119 languages. It features an expanded, high-quality pre-training corpus, architectural refinements like qk layernorm, and a three-stage pre-training process focusing on broad language modeling, reasoning skills, and long-context comprehension up to 32,768 tokens. This base model is designed for general language understanding and generation tasks, benefiting from systematic hyperparameter tuning for improved stability and performance.

Loading preview...

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen3 series, developed by Qwen. This model is pre-trained on an extensive corpus of 36 trillion tokens covering 119 languages, significantly expanding on previous iterations with a richer mix of high-quality data including coding, STEM, reasoning, and multilingual content. It incorporates advanced training techniques and architectural refinements, such as global-batch load balancing for MoE models and qk layernorm, to enhance stability and overall performance.

Key Capabilities & Features

  • Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, with a focus on diverse, high-quality data.
  • Three-stage Pre-training: Progresses from general language modeling to specialized reasoning skills (STEM, coding, logical reasoning) and enhanced long-context comprehension.
  • Long Context Window: Supports a context length of up to 32,768 tokens, improving its ability to process and understand longer inputs.
  • Architectural Refinements: Includes qk layernorm and other techniques for improved training stability and performance.
  • Scaling Law Guided Tuning: Hyperparameters are systematically tuned across pre-training stages for optimal performance at different model scales.

When to Use This Model

Qwen3-1.7B-Base is suitable for applications requiring a robust base model for general language understanding and generation. Its extensive multilingual training and focus on reasoning and long-context comprehension make it a strong candidate for tasks such as text summarization, content creation, and multilingual applications where a smaller, efficient model is desired. For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.