Qwen/Qwen3.5-397B-A17B
Qwen3.5-397B-A17B is a causal language model with a vision encoder developed by Qwen, featuring 397 billion total parameters with 17 billion activated. It utilizes an efficient hybrid architecture combining Gated Delta Networks and sparse Mixture-of-Experts for high-throughput inference. This model excels in unified vision-language understanding, agentic capabilities, and supports an extended context length of 262,144 tokens, extensible up to 1,010,000 tokens, making it suitable for complex multimodal tasks and long-horizon problem-solving across 201 languages.
Loading preview...
Qwen3.5-397B-A17B: A Unified Multimodal Agent
Qwen3.5-397B-A17B is a powerful multimodal large language model developed by Qwen, featuring a total of 397 billion parameters with 17 billion activated. It integrates significant advancements in multimodal learning, architectural efficiency, and reinforcement learning to deliver exceptional utility and performance. The model is designed for robust real-world adaptability and global accessibility, supporting 201 languages and dialects.
Key Capabilities
- Unified Vision-Language Foundation: Achieves strong performance across reasoning, coding, agents, and visual understanding benchmarks through early fusion training on multimodal tokens.
- Efficient Hybrid Architecture: Employs Gated Delta Networks and sparse Mixture-of-Experts for high-throughput inference with optimized latency and cost.
- Scalable RL Generalization: Benefits from reinforcement learning scaled across millions of agent environments, enhancing adaptability to complex tasks.
- Global Linguistic Coverage: Supports 201 languages and dialects, ensuring nuanced cultural and regional understanding.
- Ultra-Long Context: Natively handles up to 262,144 tokens, extensible to 1,010,000 tokens with YaRN scaling, ideal for long-horizon tasks.
- Agentic Usage: Excels in tool calling, with recommended integration via Qwen-Agent for building agent applications and Qwen Code for terminal-based code assistance.
Good for
- Complex Multimodal Reasoning: Ideal for tasks requiring deep understanding and reasoning across both visual and textual inputs, such as STEM problems with diagrams or document analysis.
- High-Throughput Inference: Its efficient hybrid architecture makes it suitable for applications demanding fast and cost-effective model responses.
- Long-Context Applications: Excellent for processing and generating content in scenarios with extensive context, like legal documents, research papers, or detailed conversations.
- Multilingual Applications: Supports a vast array of languages, making it a strong choice for global deployments and culturally sensitive interactions.
- Agent Development: Optimized for tool use and agentic workflows, enabling automation and complex task execution in various environments.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.