Qwen/Qwen3.5-397B-A17B

Warm
Public
397B
FP8
32768
4
Feb 16, 2026
License: apache-2.0
Hugging Face
Overview

Qwen3.5-397B-A17B: A Unified Multimodal Agent

Qwen3.5-397B-A17B is a powerful causal language model with a vision encoder from Qwen, designed for advanced multimodal and agentic applications. It boasts a total of 397 billion parameters, with 17 billion activated, and features an efficient hybrid architecture that includes Gated Delta Networks and sparse Mixture-of-Experts for optimized inference performance. The model natively supports a substantial context length of 262,144 tokens, which can be extended up to 1,010,000 tokens using YaRN scaling techniques, enabling it to handle ultra-long texts and complex tasks.

Key Capabilities

  • Unified Vision-Language Foundation: Achieves strong performance across reasoning, coding, agents, and visual understanding benchmarks through early fusion training on multimodal tokens.
  • Efficient Hybrid Architecture: Employs Gated Delta Networks and sparse Mixture-of-Experts for high-throughput inference with reduced latency and cost.
  • Scalable RL Generalization: Benefits from reinforcement learning scaled across millions of agent environments, enhancing real-world adaptability.
  • Global Linguistic Coverage: Supports 201 languages and dialects, facilitating inclusive worldwide deployment.
  • Extended Context Handling: Natively processes up to 262,144 tokens, with extensibility to over 1 million tokens for long-horizon tasks.
  • Agentic Excellence: Demonstrates strong tool-calling capabilities, optimized for building agent applications.

Good for

  • Complex Multimodal Reasoning: Ideal for tasks requiring both visual and linguistic understanding, such as STEM problems with diagrams or document analysis.
  • Agent Development: Suited for building sophisticated AI agents that can interact with environments and utilize tools effectively.
  • Ultra-Long Document Processing: Excellent for applications needing to process and understand very long texts, like legal documents or extensive research papers.
  • Global Applications: Its broad linguistic support makes it suitable for international deployments and diverse user bases.
  • High-Throughput Inference: The efficient architecture is beneficial for production environments requiring fast and cost-effective model serving.