unsloth/Qwen3.5-27B

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Feb 24, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Qwen3.5-27B is a 27 billion parameter multimodal causal language model developed by Qwen, featuring a unified vision-language foundation and an efficient hybrid architecture. It excels in reasoning, coding, agent tasks, and visual understanding, supporting a native context length of 262,144 tokens and extensible up to 1,010,000 tokens. The model offers expanded linguistic support for 201 languages and dialects, making it suitable for global deployment and complex multimodal applications.

Loading preview...

Qwen3.5-27B: A Multimodal Agent with Extended Context

Qwen3.5-27B is a 27 billion parameter multimodal causal language model developed by Qwen, designed for exceptional utility and performance across various tasks. It integrates significant advancements in multimodal learning, architectural efficiency, and reinforcement learning.

Key Capabilities & Features

  • Unified Vision-Language Foundation: Achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models in reasoning, coding, agent tasks, and visual understanding through early fusion training on multimodal tokens.
  • Efficient Hybrid Architecture: Utilizes Gated Delta Networks combined with sparse Mixture-of-Experts for high-throughput inference with minimal latency.
  • Scalable RL Generalization: Features reinforcement learning scaled across million-agent environments for robust real-world adaptability.
  • Global Linguistic Coverage: Supports 201 languages and dialects, enabling inclusive worldwide deployment.
  • Extended Context Length: Natively supports 262,144 tokens, extensible up to 1,010,000 tokens using RoPE scaling techniques like YaRN.
  • Multimodal Input Support: Capable of processing text, image, and video inputs.
  • Agentic Usage: Excels in tool calling, with recommended integration via Qwen-Agent and Qwen Code for terminal-based AI agent applications.

Performance Highlights

The model demonstrates strong performance across various benchmarks, including:

  • Language: Achieves 93.2 on MMLU-Redux, 90.5 on C-Eval, and 95.0 on IFEval.
  • Coding: Scores 72.4 on SWE-bench Verified and 80.7 on LiveCodeBench v6.
  • Vision Language: Attains 82.3 on MMMU, 86.0 on MathVision, and 92.6 on MMBench.
  • Multilingualism: Scores 85.9 on MMMLU and 82.2 on MMLU-ProX (averaged across 29 languages).

Best Practices

  • Sampling Parameters: Specific temperature, top_p, top_k, min_p, presence_penalty, and repetition_penalty settings are recommended for different modes (thinking/instruct) and task types (general/precise coding/reasoning).
  • Output Length: Recommend 32,768 tokens for most queries, up to 81,920 for complex problems.
  • Standardized Output: Use prompts to standardize output formats for math problems and multiple-choice questions.
  • No Thinking Content in History: Ensure historical model output in multi-turn conversations only includes the final response.
  • Long Video Understanding: Adjust video_preprocessor_config.json for higher frame-rate sampling in hour-scale videos.