unsloth/Qwen3-4B-Base

Warm
Public
4B
BF16
40960
License: apache-2.0
Hugging Face
Overview

Qwen3-4B-Base Overview

Qwen3-4B-Base is a 4.0 billion parameter causal language model from the Qwen3 series, developed by Qwen. It builds upon significant advancements in training data, model architecture, and optimization techniques compared to its predecessors. The model was pre-trained on an extensive corpus of 36 trillion tokens covering 119 languages, a substantial increase in both quantity and quality, including diverse data types such as coding, STEM, reasoning, and multilingual content.

Key Capabilities & Features

  • Expanded Pre-training Corpus: Utilizes a significantly larger and higher-quality dataset, tripling language coverage compared to Qwen2.5.
  • Architectural Refinements: Incorporates training techniques and architectural improvements like qk layernorm for enhanced stability and performance.
  • Three-stage Pre-training: Employs a staged approach focusing on broad language modeling, reasoning skills (STEM, coding, logical reasoning), and long-context comprehension, extending sequence lengths up to 32,768 tokens.
  • Scaling Law Guided Tuning: Hyperparameters were systematically tuned using scaling law studies across the pre-training pipeline for optimal performance.
  • Context Length: Supports a context length of 32,768 tokens.

When to Use This Model

Qwen3-4B-Base is suitable for applications requiring a robust base model with strong general language understanding, multilingual capabilities, and improved reasoning. Its extensive pre-training on diverse data makes it a solid foundation for fine-tuning on various downstream tasks, particularly those benefiting from broad knowledge and long-context processing.