Qwen/Qwen3-1.7B-Base
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 28, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

Qwen/Qwen3-1.7B-Base is a 1.7 billion parameter causal language model developed by Qwen, pre-trained on 36 trillion tokens across 119 languages. This model incorporates architectural refinements and a three-stage pre-training process to enhance reasoning, coding, and long-context comprehension up to 32,768 tokens. It is designed for broad language modeling and general knowledge acquisition, with a focus on improved stability and performance across diverse tasks.

Loading preview...

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen3 series, pre-trained on an expanded, high-quality corpus of 36 trillion tokens covering 119 languages. This represents a significant increase in language coverage and data richness compared to its predecessor, Qwen2.5, including coding, STEM, reasoning, and multilingual data.

Key Advancements

  • Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a focus on high-quality data for diverse tasks.
  • Architectural Refinements: Incorporates training techniques like global-batch load balancing loss (for MoE models) and qk layernorm for all models, enhancing stability and overall performance.
  • Three-stage Pre-training: Progresses from general language modeling to specialized reasoning skills (STEM, coding) and finally to long-context comprehension, extending up to 32,768 tokens.
  • Scaling Law Guided Tuning: Hyperparameters are systematically tuned across the pre-training pipeline for optimal training dynamics and performance.

Model Specifications

  • Parameters: 1.7 billion (1.4 billion non-embedding)
  • Layers: 28
  • Attention Heads (GQA): 16 for Q, 8 for KV
  • Context Length: 32,768 tokens

This model is suitable for applications requiring robust language understanding and generation, particularly benefiting from its extensive multilingual training and enhanced reasoning capabilities within a substantial context window.