Qwen/Qwen3-0.6B-Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Apr 28, 2025License:apache-2.0Architecture:Transformer0.2K Open Weights Warm

Qwen/Qwen3-0.6B-Base is a 0.6 billion parameter causal language model developed by Qwen, part of the Qwen3 series. Pre-trained on 36 trillion tokens across 119 languages, it features an expanded, high-quality corpus and architectural refinements like qk layernorm. This model is designed for broad language modeling and general knowledge acquisition, with a focus on improving reasoning skills and long-context comprehension up to 32,768 tokens.

Loading preview...

Qwen3-0.6B-Base Overview

Qwen3-0.6B-Base is a 0.6 billion parameter causal language model from the Qwen3 series, developed by Qwen. It represents the latest generation of Qwen models, incorporating significant advancements in training data, model architecture, and optimization techniques. This base model is pre-trained and designed for general language understanding and generation tasks.

Key Capabilities & Features

  • Expanded Pre-training Corpus: Trained on an extensive 36 trillion tokens across 119 languages, tripling the language coverage of its predecessor, Qwen2.5. The corpus includes a rich mix of high-quality data for coding, STEM, reasoning, and multilingual tasks.
  • Architectural Refinements: Integrates training techniques and architectural improvements, such as qk layernorm, to enhance stability and overall performance.
  • Three-stage Pre-training: Employs a staged pre-training approach focusing on broad language modeling, followed by improved reasoning skills (STEM, coding, logical reasoning), and finally enhanced long-context comprehension.
  • Long Context Window: Supports a context length of up to 32,768 tokens, facilitating processing of longer inputs and generating more coherent extended outputs.

When to Use This Model

Qwen3-0.6B-Base is suitable for developers seeking a compact yet capable base model for various natural language processing tasks. Its extensive multilingual training and focus on reasoning and long-context understanding make it a strong candidate for applications requiring general language intelligence, especially in multilingual environments or tasks benefiting from a larger context window. It serves as a foundational model for further fine-tuning on specific downstream applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p