Name: dnotitia/Qwen3-4B-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dnotitia

Qwen3-4B-Base Overview

dnotitia/Qwen3-4B-Base is a 4.0 billion parameter causal language model, part of the Qwen3 series. This specific version, patched by dnotitia, maintains the original Qwen3 weights but includes a refactored chat template and {% generation %} tags for better compatibility with the trl library's assistant_only_loss feature, making it ideal for efficient training experiments.

Key Qwen3 Highlights

Qwen3 represents the latest generation of Qwen models, built upon significant advancements in training data, model architecture, and optimization. Key improvements over Qwen2.5 include:

Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor. The dataset is rich in high-quality data, including coding, STEM, reasoning, and multilingual content.
Advanced Training Techniques: Incorporates architectural refinements such as global-batch load balancing loss for MoE models and qk layernorm for all models, enhancing stability and performance.
Three-stage Pre-training: A structured approach focusing on broad language modeling, then improving reasoning skills (STEM, coding), and finally enhancing long-context comprehension up to 32k tokens.
Scaling Law Guided Tuning: Critical hyperparameters were systematically tuned using scaling law studies to optimize training dynamics and performance across different model scales.

Model Specifications

Parameters: 4.0 billion (3.6 billion non-embedding)
Layers: 36
Attention Heads (GQA): 32 for Q, 8 for KV
Context Length: 32,768 tokens

For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.

Overview

Qwen3-4B-Base Overview

Key Qwen3 Highlights

Model Specifications

Full Model Card (README)