Name: agadelmoula-avey/Qwen3-4B-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: agadelmoula-avey

Qwen3-4B-Base Overview

Qwen3-4B-Base is a 4.0 billion parameter causal language model, part of the Qwen3 series developed by the Qwen Team. This model builds upon significant advancements in training data, architecture, and optimization techniques compared to its predecessor, Qwen2.5.

Key Capabilities & Features

Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, tripling the language coverage of Qwen2.5. The dataset includes a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
Architectural Refinements: Incorporates training techniques and architectural improvements like qk layernorm for enhanced stability and performance.
Three-stage Pre-training: Utilizes a staged approach focusing on broad language modeling, improving reasoning skills (STEM, coding, logical reasoning), and enhancing long-context comprehension by extending sequence lengths up to 32,768 tokens.
Scaling Law Guided Tuning: Hyperparameters are systematically tuned across the pre-training pipeline for optimal training dynamics and performance.

Model Specifications

Parameters: 4.0 billion (3.6 billion non-embedding)
Context Length: 32,768 tokens
Layers: 36
Attention Heads (GQA): 32 for Q, 8 for KV

For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.

Overview

Qwen3-4B-Base Overview

Key Capabilities & Features

Model Specifications

Full Model Card (README)