CCCCCyx/Qwen3-8B-Base-sft-dolci-think

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-8B-Base-sft-dolci-think is an 8.2 billion parameter causal language model from the Qwen3 series, pre-trained by Qwen. It features an expanded 36 trillion token corpus across 119 languages, incorporating advanced training techniques and architectural refinements for improved stability and performance. This base model excels in broad language modeling, general knowledge acquisition, and reasoning skills, with a 32,768 token context length.

Loading preview...

Qwen3-8B-Base-sft-dolci-think Overview

This model is an 8.2 billion parameter causal language model from the Qwen3 series, developed by Qwen. It represents the latest generation of Qwen models, building on significant advancements in training data, architecture, and optimization. Key improvements over previous iterations include a substantially expanded and higher-quality pre-training corpus, advanced training techniques, and a three-stage pre-training process.

Key Capabilities & Features

  • Expanded Pre-training Corpus: Trained on 36 trillion tokens across 119 languages, significantly increasing language coverage and data quality, including coding, STEM, reasoning, and synthetic data.
  • Architectural Refinements: Incorporates training techniques like global-batch load balancing loss for MoE models and qk layernorm for all models, enhancing stability and overall performance.
  • Three-stage Pre-training: Focuses on broad language modeling and general knowledge (Stage 1), improved reasoning skills (STEM, coding, logical reasoning) (Stage 2), and enhanced long-context comprehension up to 32k tokens (Stage 3).
  • Optimized Hyperparameter Tuning: Utilizes scaling law studies to systematically tune critical hyperparameters for better training dynamics and performance across different model scales.
  • Context Length: Supports a substantial context length of 32,768 tokens.

Good For

  • Applications requiring strong general language understanding and generation.
  • Tasks benefiting from enhanced reasoning capabilities, including STEM and coding-related problems.
  • Use cases demanding long-context comprehension.
  • Multilingual applications due to its extensive language coverage.