prompt-agnostic-language-models/Qwen-8B_all_shuffled
Qwen-8B_all_shuffled is an 8.2 billion parameter causal language model from the Qwen3 series, developed by Qwen. This pre-trained base model is built on an expanded, high-quality corpus of 36 trillion tokens across 119 languages, significantly enhancing its multilingual capabilities and general knowledge. It incorporates advanced training techniques and architectural refinements, including a three-stage pre-training process focused on broad language modeling, reasoning skills, and long-context comprehension up to 32,768 tokens.
Loading preview...
Qwen3-8B-Base Overview
Qwen3-8B-Base is an 8.2 billion parameter causal language model, part of the latest Qwen3 series developed by Qwen. This model is a pre-trained base version, distinguished by significant advancements over its predecessor, Qwen2.5.
Key Improvements and Features
- Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, tripling the language coverage of Qwen2.5. The dataset includes a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
- Advanced Training Techniques: Incorporates architectural refinements like qk layernorm and a three-stage pre-training process. Stage 1 focuses on general language modeling, Stage 2 enhances reasoning (STEM, coding, logical reasoning), and Stage 3 extends long-context comprehension up to 32,768 tokens.
- Optimized Hyperparameter Tuning: Utilizes scaling law studies to systematically tune hyperparameters, improving training dynamics and performance across different model scales.
- Technical Specifications: Features 36 layers, 32 attention heads for Q, and 8 for KV, with a context length of 32,768 tokens.
When to Use This Model
This model is suitable for applications requiring a robust, multilingual base model with strong general knowledge and reasoning capabilities, especially where long context understanding is beneficial. Its pre-trained nature makes it a strong foundation for further fine-tuning on specific tasks.