Qwen/Qwen3-1.7B-Base

Cold
Public
2B
BF16
40960
License: apache-2.0
Hugging Face
Overview

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen3 series, pre-trained on an expanded, high-quality corpus of 36 trillion tokens covering 119 languages. This represents a significant increase in language coverage and data richness compared to its predecessor, Qwen2.5, including coding, STEM, reasoning, and multilingual data.

Key Advancements

  • Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a focus on high-quality data for diverse tasks.
  • Architectural Refinements: Incorporates training techniques like global-batch load balancing loss (for MoE models) and qk layernorm for all models, enhancing stability and overall performance.
  • Three-stage Pre-training: Progresses from general language modeling to specialized reasoning skills (STEM, coding) and finally to long-context comprehension, extending up to 32,768 tokens.
  • Scaling Law Guided Tuning: Hyperparameters are systematically tuned across the pre-training pipeline for optimal training dynamics and performance.

Model Specifications

  • Parameters: 1.7 billion (1.4 billion non-embedding)
  • Layers: 28
  • Attention Heads (GQA): 16 for Q, 8 for KV
  • Context Length: 32,768 tokens

This model is suitable for applications requiring robust language understanding and generation, particularly benefiting from its extensive multilingual training and enhanced reasoning capabilities within a substantial context window.