Name: Qwen/Qwen3-14B-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Qwen

Qwen3-14B-Base Overview

Qwen3-14B-Base is a 14.8 billion parameter pre-trained causal language model from the Qwen series, building upon advancements in training data, architecture, and optimization. It features a substantial expansion in its pre-training corpus, now encompassing 36 trillion tokens across 119 languages, significantly tripling the language coverage of its predecessor, Qwen2.5. The dataset includes a rich mix of high-quality data, such as coding, STEM, reasoning, and multilingual content.

Key Improvements & Features

Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, with a focus on high-quality data for coding, STEM, and reasoning.
Architectural Refinements: Incorporates training techniques like global-batch load balancing loss for MoE models and qk layernorm for all models, enhancing stability and performance.
Three-stage Pre-training: A structured approach that first builds general language modeling, then refines reasoning skills (STEM, coding, logical reasoning), and finally extends long-context comprehension up to 32,768 tokens.
Scaling Law Guided Tuning: Hyperparameters are systematically tuned using scaling law studies across the pre-training pipeline for optimal training dynamics and performance.

Model Specifications

Parameters: 14.8 billion (13.2 billion non-embedding)
Context Length: 32,768 tokens
Layers: 40
Attention Heads (GQA): 40 for Q, 8 for KV

For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.