CCCCCyx/Qwen3-8B-onpolicy-profiling-muon-20260413_090005
The CCCCCyx/Qwen3-8B-onpolicy-profiling-muon-20260413_090005 model is an 8.2 billion parameter causal language model from the Qwen3 series, developed by Qwen. It is pre-trained on an expanded 36 trillion token corpus covering 119 languages, incorporating architectural refinements like qk layernorm and a three-stage pre-training process. This model is designed for broad language modeling and general knowledge acquisition, with enhanced reasoning skills and long-context comprehension up to 32,768 tokens.
Loading preview...
Qwen3-8B-Base Overview
This model is an 8.2 billion parameter causal language model, part of the Qwen3 series, which represents the latest generation of Qwen's large language models. It builds upon significant advancements in training data, model architecture, and optimization techniques compared to Qwen2.5.
Key Improvements & Capabilities
- Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor. The corpus includes a rich mix of high-quality data such as coding, STEM, reasoning, and multilingual content.
- Architectural Refinements: Incorporates advanced training techniques and architectural improvements, including qk layernorm for enhanced stability and performance.
- Three-stage Pre-training: The training process is divided into three stages:
- Stage 1: Focuses on broad language modeling and general knowledge.
- Stage 2: Improves reasoning skills, including STEM, coding, and logical reasoning.
- Stage 3: Extends training sequence lengths up to 32,768 tokens to enhance long-context comprehension.
- Scaling Law Guided Tuning: Critical hyperparameters were systematically tuned using scaling law studies across the pre-training pipeline, optimizing training dynamics and performance.
Model Specifications
- Parameters: 8.2 billion (6.95 billion non-embedding parameters)
- Context Length: 32,768 tokens
- Layers: 36
- Attention Heads (GQA): 32 for Q, 8 for KV
For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.