unsloth/Qwen3-14B-Base

Warm
Public
14B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Qwen3-14B-Base Overview

Qwen3-14B-Base is a 14.8 billion parameter causal language model, part of the latest Qwen series. Developed by Qwen, this model builds upon advancements in training data, architecture, and optimization techniques, offering significant improvements over its predecessors. It features a 32,768 token context length, enabling robust long-context comprehension.

Key Capabilities and Improvements

  • Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens covering 119 languages, tripling the language coverage of Qwen2.5. The corpus includes a rich mix of high-quality data such as coding, STEM, reasoning, and multilingual content.
  • Architectural Refinements: Incorporates advanced training techniques and architectural improvements, including qk layernorm, enhancing stability and overall performance.
  • Three-stage Pre-training: Employs a structured pre-training approach:
    • Stage 1: Focuses on broad language modeling and general knowledge.
    • Stage 2: Improves reasoning skills, including STEM, coding, and logical reasoning.
    • Stage 3: Extends training sequence lengths up to 32k tokens for enhanced long-context comprehension.
  • Optimized Hyperparameter Tuning: Utilizes scaling law studies to systematically tune critical hyperparameters for improved training dynamics and performance across different model scales.

Model Specifications

  • Type: Causal Language Model
  • Training Stage: Pretraining
  • Parameters: 14.8 billion (13.2 billion non-embedding)
  • Layers: 40
  • Attention Heads (GQA): 40 for Q, 8 for KV
  • Context Length: 32,768 tokens

For detailed evaluation results and further information, refer to the Qwen3 blog and GitHub repository.