Qwen3.5-9B: A Multimodal Agent Foundation Model

Qwen3.5-9B is a 9 billion parameter multimodal large language model developed by Qwen, designed for exceptional utility and performance across various tasks. It integrates significant advancements in multimodal learning, architectural efficiency, and reinforcement learning.

Key Capabilities & Features

Unified Vision-Language Foundation: Achieves strong performance in reasoning, coding, agent tasks, and visual understanding through early fusion training on multimodal tokens.
Efficient Hybrid Architecture: Utilizes Gated Delta Networks combined with sparse Mixture-of-Experts for high-throughput inference with minimal latency.
Scalable RL Generalization: Features reinforcement learning scaled across millions of agent environments, enhancing real-world adaptability.
Global Linguistic Coverage: Supports 201 languages and dialects, enabling broad deployment with nuanced cultural understanding.
Extended Context Length: Natively handles up to 262,144 tokens, extensible to 1,010,000 tokens using YaRN scaling techniques.
Tool Calling: Demonstrates strong capabilities in tool calling, with recommended integration via Qwen-Agent or Qwen Code.

Performance Highlights

Qwen3.5-9B shows competitive and often superior performance across various benchmarks:

Language: Achieves 82.5 on MMLU-Pro, 91.1 on MMLU-Redux, and 88.2 on C-Eval.
Instruction Following: Scores 91.5 on IFEval and 64.5 on IFBench.
Long Context: Attains 63.0 on AA-LCR and 55.2 on LongBench v2.
Vision Language: Excels in multimodal benchmarks, scoring 78.4 on MMMU, 78.9 on MathVision, and 90.1 on MMBench.
General Agent: Achieves 66.1 on BFCL-V4 and 79.1 on TAU2-Bench, indicating strong agentic capabilities.

When to Use This Model

Qwen3.5-9B is ideal for applications requiring:

Multimodal understanding: Processing and reasoning over both text and image/video inputs.
Complex agentic tasks: Building AI agents that can interact with environments and use tools effectively.
Long context processing: Handling extensive documents or conversations with its large context window.
Global applications: Leveraging its broad multilingual support for diverse user bases.

Overview

Qwen3.5-9B: A Multimodal Agent Foundation Model

Key Capabilities & Features

Performance Highlights

When to Use This Model

Full Model Card (README)