unsloth/Qwen3-1.7B-Base

Warm
Public
2B
BF16
40960
License: apache-2.0
Hugging Face
Overview

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen series, representing the latest generation of Qwen's large language models. It builds upon extensive advancements in training data, model architecture, and optimization techniques, offering significant improvements over previous iterations.

Key Capabilities & Features

  • Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor. The corpus includes a rich mix of high-quality data, such as coding, STEM, reasoning, book, multilingual, and synthetic data.
  • Architectural Refinements: Incorporates advanced training techniques and architectural improvements, including qk layernorm for all models, leading to enhanced stability and overall performance.
  • Three-stage Pre-training: Utilizes a structured pre-training approach:
    • Stage 1: Focuses on broad language modeling and general knowledge acquisition.
    • Stage 2: Improves reasoning skills, including STEM, coding, and logical reasoning.
    • Stage 3: Enhances long-context comprehension by extending training sequence lengths up to 32,768 tokens.
  • Scaling Law Guided Hyperparameter Tuning: Critical hyperparameters were systematically tuned through comprehensive scaling law studies across the pre-training pipeline, optimizing training dynamics and final performance.

Use Cases

This model is well-suited for applications requiring robust general language understanding, multilingual capabilities, and enhanced reasoning across various domains. Its extended context length makes it particularly effective for tasks involving longer texts and complex information processing.