willhx/Qwen3-8B-Base-Math

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The Qwen3-8B-Base-Math model by Qwen is an 8.2 billion parameter causal language model, part of the Qwen3 series, pre-trained on 36 trillion tokens across 119 languages. It features a three-stage pre-training process that emphasizes broad language modeling, reasoning skills (including STEM and coding), and long-context comprehension up to 32,768 tokens. This model is designed for general knowledge acquisition and enhanced logical reasoning capabilities.

Loading preview...

Overview

Qwen3-8B-Base-Math is an 8.2 billion parameter causal language model from the Qwen3 series, developed by Qwen. It builds upon previous generations with significant advancements in its training corpus, model architecture, and optimization techniques. The model was pre-trained on an expanded, higher-quality dataset of 36 trillion tokens covering 119 languages, a substantial increase in linguistic diversity and data richness compared to Qwen2.5.

Key Capabilities & Improvements

  • Expanded Pre-training Corpus: Utilizes 36 trillion tokens across 119 languages, with a focus on high-quality data including coding, STEM, reasoning, and multilingual content.
  • Architectural Refinements: Incorporates advanced training techniques like global-batch load balancing loss and qk layernorm for improved stability and performance.
  • Three-stage Pre-training: A structured approach that first establishes general knowledge, then enhances reasoning skills (STEM, coding, logical reasoning), and finally extends long-context comprehension up to 32,768 tokens.
  • Scaling Law Guided Tuning: Hyperparameters are systematically tuned across the pre-training pipeline for optimal training dynamics and performance.

Model Specifications

  • Type: Causal Language Model
  • Parameters: 8.2 billion (6.95 billion non-embedding)
  • Context Length: 32,768 tokens
  • Layers: 36

Further Information

For detailed evaluation results and technical insights, refer to the official Qwen3 blog and GitHub repository.