LucasJYH/Qwen3-1.7B-Base

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

LucasJYH/Qwen3-1.7B-Base is a 1.7 billion parameter causal language model from the Qwen3 series, pre-trained on 36 trillion tokens across 119 languages with a 32,768 token context length. It incorporates advanced training techniques like qk layernorm and a three-stage pre-training process focusing on general language, reasoning (STEM, coding), and long-context comprehension. This model is designed for broad language modeling and general knowledge acquisition, with specific enhancements for stability and performance across various scales.

Loading preview...

Qwen3-1.7B-Base Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model, part of the latest Qwen3 series. This model builds upon significant advancements in training data, architecture, and optimization, offering improved stability and performance over previous generations. It features a substantial 32,768 token context length, making it suitable for tasks requiring extensive contextual understanding.

Key Capabilities

  • Expanded Multilingual Pre-training: Trained on an extensive corpus of 36 trillion tokens covering 119 languages, tripling the language coverage of Qwen2.5. The dataset includes a rich mix of high-quality data such for coding, STEM, reasoning, and multilingual tasks.
  • Advanced Training Techniques: Incorporates architectural refinements such as qk layernorm for enhanced stability and overall performance.
  • Three-stage Pre-training: Utilizes a structured pre-training approach:
    • Stage 1: Focuses on broad language modeling and general knowledge.
    • Stage 2: Improves reasoning skills, including STEM, coding, and logical reasoning.
    • Stage 3: Enhances long-context comprehension by extending training sequence lengths.

Good for

  • Applications requiring robust general language understanding and generation.
  • Tasks benefiting from a broad multilingual knowledge base.
  • Scenarios where a substantial context window (32,768 tokens) is advantageous for processing longer inputs.