Qwen/Qwen3.5-4B-Base

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

Qwen/Qwen3.5-4B-Base is a 4.5 billion parameter causal language model developed by Qwen, featuring a unified vision-language foundation and an efficient hybrid architecture. This model integrates breakthroughs in multimodal learning and architectural efficiency, supporting a native context length of 262,144 tokens. It is designed for fine-tuning, in-context learning, and research, excelling in cross-generational parity with Qwen3 and outperforming Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Loading preview...

Qwen3.5-4B-Base Overview

Qwen3.5-4B-Base is a 4.5 billion parameter causal language model developed by Qwen, built upon a unified vision-language foundation. This model represents a significant advancement, integrating multimodal learning, architectural efficiency, and scalable reinforcement learning. It is primarily intended for fine-tuning, in-context learning experiments, and other research or development purposes, rather than direct interactive use.

Key Capabilities and Enhancements

  • Unified Vision-Language Foundation: Achieves strong performance across reasoning, coding, agent tasks, and visual understanding benchmarks, demonstrating cross-generational parity with Qwen3 and surpassing Qwen3-VL models.
  • Efficient Hybrid Architecture: Utilizes Gated Delta Networks combined with sparse Mixture-of-Experts for high-throughput inference with optimized latency and cost.
  • Scalable RL Generalization: Features reinforcement learning scaled across millions of agent environments, enhancing robust real-world adaptability.
  • Global Linguistic Coverage: Expanded support for 201 languages and dialects, facilitating inclusive worldwide deployment.
  • Next-Generation Training Infrastructure: Achieves near-100% multimodal training efficiency compared to text-only training, supported by asynchronous RL frameworks.

Technical Specifications

This model has a native context length of 262,144 tokens, extensible up to 1,010,000 tokens. It incorporates a Gated DeltaNet and Gated Attention mechanism within its 32 layers. The model's design allows for efficient LoRA-style PEFT, mitigating the need to fine-tune embeddings, which is a significant optimization given its larger vocabulary.

For more details, refer to the Qwen3.5 blog post.