hamishivi/Qwen3.5-9B

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 15, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3.5-9B is a 9 billion parameter multimodal large language model developed by Qwen, featuring a unified vision-language foundation and an efficient hybrid architecture. It excels in reasoning, coding, agent capabilities, and visual understanding, supporting a native context length of 262,144 tokens and expanded linguistic coverage across 201 languages. This model is designed for high-throughput inference and robust real-world adaptability, making it suitable for complex multimodal applications.

Loading preview...

Qwen3.5-9B: A Multimodal Agent Foundation Model

Qwen3.5-9B is a 9 billion parameter multimodal large language model developed by Qwen, designed for exceptional utility and performance. It integrates advancements in multimodal learning, architectural efficiency, and reinforcement learning to deliver robust capabilities.

Key Capabilities & Features

  • Unified Vision-Language Foundation: Achieves strong performance across reasoning, coding, agent tasks, and visual understanding by early fusion training on multimodal tokens.
  • Efficient Hybrid Architecture: Utilizes Gated Delta Networks combined with sparse Mixture-of-Experts for high-throughput inference with minimal latency.
  • Scalable RL Generalization: Features reinforcement learning scaled across million-agent environments for robust real-world adaptability.
  • Global Linguistic Coverage: Supports 201 languages and dialects, enabling inclusive worldwide deployment.
  • Ultra-Long Context: Natively handles up to 262,144 tokens, extensible to 1,010,000 tokens using YaRN scaling techniques.
  • Multimodal Input: Supports text, image, and video inputs.

Performance Highlights

Qwen3.5-9B demonstrates strong benchmark results, often outperforming previous Qwen3 models and competitive alternatives in its size class across various domains:

  • Language: Achieves 82.5 on MMLU-Pro, 88.2 on C-Eval, and 91.5 on IFEval.
  • Vision Language: Scores 78.4 on MMMU, 78.9 on MathVision, and 90.1 on MMBench (EN-DEV-v1.1).
  • Agentic Capabilities: Shows strong performance in general agent benchmarks like BFCL-V4 (66.1) and TAU2-Bench (79.1), and excels in tool calling with 45.6 on TIR-Bench.

Good for

  • Developing multimodal applications requiring advanced reasoning and visual understanding.
  • Applications needing extensive language support and long context processing.
  • Building AI agents with strong tool-calling capabilities.
  • High-throughput inference scenarios where efficiency is critical.