Name: CCCCCyx/Qwen3-8B-onpolicy-profiling-muon-20260413_090005 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CCCCCyx

Qwen3-8B-Base Overview

This model is an 8.2 billion parameter causal language model, part of the Qwen3 series, which represents the latest generation of Qwen's large language models. It builds upon significant advancements in training data, model architecture, and optimization techniques compared to Qwen2.5.

Key Improvements & Capabilities

Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor. The corpus includes a rich mix of high-quality data such as coding, STEM, reasoning, and multilingual content.
Architectural Refinements: Incorporates advanced training techniques and architectural improvements, including qk layernorm for enhanced stability and performance.
Three-stage Pre-training: The training process is divided into three stages:
- Stage 1: Focuses on broad language modeling and general knowledge.
- Stage 2: Improves reasoning skills, including STEM, coding, and logical reasoning.
- Stage 3: Extends training sequence lengths up to 32,768 tokens to enhance long-context comprehension.
Scaling Law Guided Tuning: Critical hyperparameters were systematically tuned using scaling law studies across the pre-training pipeline, optimizing training dynamics and performance.

Model Specifications

Parameters: 8.2 billion (6.95 billion non-embedding parameters)
Context Length: 32,768 tokens
Layers: 36
Attention Heads (GQA): 32 for Q, 8 for KV

For detailed evaluation results and further information, refer to the official Qwen3 blog and GitHub repository.

Overview

Qwen3-8B-Base Overview

Key Improvements & Capabilities

Model Specifications

Full Model Card (README)