ChuGyouk/Qwen3-14B-Base

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Qwen3-14B-Base is a 14.8 billion parameter causal language model from the Qwen series, developed by Qwen. This pre-trained model features an expanded, higher-quality corpus of 36 trillion tokens across 119 languages, incorporating advanced training techniques like qk layernorm and a three-stage pre-training pipeline. It is designed for broad language modeling, general knowledge acquisition, and enhanced reasoning skills, supporting a context length of 32,768 tokens.

Loading preview...

Overview

Qwen3-14B-Base is a 14.8 billion parameter pre-trained causal language model, part of the latest Qwen series. It builds upon Qwen2.5 with significant advancements in training data, model architecture, and optimization techniques. The model utilizes a context length of 32,768 tokens.

Key Improvements & Capabilities

  • Expanded Higher-Quality Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor. This corpus includes a rich mix of coding, STEM, reasoning, book, multilingual, and synthetic data.
  • Advanced Training Techniques: Incorporates architectural refinements such as qk layernorm for improved stability and performance across all models.
  • Three-stage Pre-training:
    • Stage 1: Focuses on broad language modeling and general knowledge.
    • Stage 2: Enhances reasoning skills, including STEM, coding, and logical reasoning.
    • Stage 3: Improves long-context comprehension by extending training sequence lengths.
  • Scaling Law Guided Hyperparameter Tuning: Critical hyperparameters were systematically tuned for dense and MoE models, optimizing training dynamics and final performance.

Model Specifications

  • Type: Causal Language Model
  • Parameters: 14.8 billion (13.2 billion non-embedding)
  • Layers: 40
  • Attention Heads (GQA): 40 for Q, 8 for KV
  • Context Length: 32,768 tokens

For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.