ChuGyouk/Qwen3-4B-Base
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Dec 22, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

Qwen3-4B-Base is a 4.0 billion parameter causal language model from the Qwen series, pre-trained on 36 trillion tokens across 119 languages. This model incorporates architectural refinements and a three-stage pre-training process to enhance general knowledge, reasoning skills, and long-context comprehension up to 32,768 tokens. It is designed for broad language modeling tasks, with a focus on improved stability and performance across diverse data types including coding, STEM, and multilingual content.

Loading preview...

Qwen3-4B-Base Overview

Qwen3-4B-Base is a 4.0 billion parameter causal language model, part of the latest generation in the Qwen series. It builds upon previous iterations with significant advancements in its training data, model architecture, and optimization techniques. The model was pre-trained on an expanded, higher-quality corpus of 36 trillion tokens covering 119 languages, a substantial increase in linguistic diversity compared to Qwen2.5. This dataset includes a rich mix of coding, STEM, reasoning, book, multilingual, and synthetic data.

Key Enhancements & Capabilities

  • Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, significantly broadening its linguistic and domain coverage.
  • Architectural Refinements: Incorporates advanced training techniques and architectural improvements, such as global-batch load balancing loss for MoE models and qk layernorm, enhancing stability and overall performance.
  • Three-stage Pre-training: This structured approach focuses on:
    • Stage 1: Broad language modeling and general knowledge acquisition.
    • Stage 2: Improved reasoning skills, including STEM, coding, and logical reasoning.
    • Stage 3: Enhanced long-context comprehension, extending training sequence lengths up to 32,768 tokens.
  • Context Length: Supports a substantial context length of 32,768 tokens, beneficial for tasks requiring extensive input understanding.

Good For

  • Applications requiring robust general language understanding and generation.
  • Tasks benefiting from strong reasoning capabilities in STEM and coding domains.
  • Use cases demanding long-context comprehension.
  • Multilingual applications due to its extensive language coverage.