Name: ChuGyouk/Qwen3-14B-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ChuGyouk

Overview

Qwen3-14B-Base is a 14.8 billion parameter pre-trained causal language model, part of the latest Qwen series. It builds upon Qwen2.5 with significant advancements in training data, model architecture, and optimization techniques. The model utilizes a context length of 32,768 tokens.

Key Improvements & Capabilities

Expanded Higher-Quality Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor. This corpus includes a rich mix of coding, STEM, reasoning, book, multilingual, and synthetic data.
Advanced Training Techniques: Incorporates architectural refinements such as qk layernorm for improved stability and performance across all models.
Three-stage Pre-training:
- Stage 1: Focuses on broad language modeling and general knowledge.
- Stage 2: Enhances reasoning skills, including STEM, coding, and logical reasoning.
- Stage 3: Improves long-context comprehension by extending training sequence lengths.
Scaling Law Guided Hyperparameter Tuning: Critical hyperparameters were systematically tuned for dense and MoE models, optimizing training dynamics and final performance.

Model Specifications

Type: Causal Language Model
Parameters: 14.8 billion (13.2 billion non-embedding)
Layers: 40
Attention Heads (GQA): 40 for Q, 8 for KV
Context Length: 32,768 tokens

For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.

Overview

Overview

Key Improvements & Capabilities

Model Specifications

Full Model Card (README)