Qwen3.5-397B-A17B is a multimodal causal language model developed by Qwen, featuring 397 billion total parameters with 17 billion activated. This model integrates a unified vision-language foundation and an efficient hybrid architecture, enabling cross-generational parity with Qwen3 and outperforming Qwen3-VL models across various benchmarks. It is designed for robust real-world adaptability, excelling in multimodal understanding, reasoning, coding, and agentic tasks with a native context length of 262,144 tokens, extensible up to 1,010,000 tokens.
Loading preview...
Qwen3.5-397B-A17B: A Multimodal Agent Foundation Model
Qwen3.5-397B-A17B is a powerful multimodal causal language model developed by Qwen, featuring a total of 397 billion parameters with 17 billion activated. This model represents a significant advancement, integrating a unified vision-language foundation and an efficient hybrid architecture that combines Gated Delta Networks with sparse Mixture-of-Experts for high-throughput inference.
Key Capabilities
- Unified Vision-Language Understanding: Achieves strong performance across reasoning, coding, agentic tasks, and visual understanding benchmarks, outperforming previous Qwen3 and Qwen3-VL models.
- Efficient Hybrid Architecture: Utilizes Gated Delta Networks and sparse Mixture-of-Experts for optimized inference speed and cost.
- Scalable RL Generalization: Benefits from reinforcement learning scaled across millions of agent environments, enhancing real-world adaptability.
- Extensive Multilingual Support: Offers expanded linguistic coverage for 201 languages and dialects, facilitating global deployment.
- Ultra-Long Context: Natively supports a context length of 262,144 tokens, extensible up to 1,010,000 tokens using techniques like YaRN.
- Agentic Functionality: Excels in tool calling, supporting frameworks like Qwen-Agent and Qwen Code for building advanced agent applications.
Good For
- Complex Multimodal Tasks: Ideal for applications requiring deep understanding and generation across both text and visual inputs, including video.
- High-Performance Inference: Suitable for scenarios demanding efficient and low-latency model serving, especially with recommended frameworks like SGLang and vLLM.
- Global Applications: Its broad linguistic support makes it well-suited for international deployments and diverse user bases.
- Agent Development: A strong choice for building sophisticated AI agents that can interact with tools and environments effectively.