Name: unsloth/Qwen3.5-27B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: unsloth

Qwen3.5-27B: A Multimodal Agent with Extended Context

Qwen3.5-27B is a 27 billion parameter multimodal causal language model developed by Qwen, designed for exceptional utility and performance across various tasks. It integrates significant advancements in multimodal learning, architectural efficiency, and reinforcement learning.

Key Capabilities & Features

Unified Vision-Language Foundation: Achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models in reasoning, coding, agent tasks, and visual understanding through early fusion training on multimodal tokens.
Efficient Hybrid Architecture: Utilizes Gated Delta Networks combined with sparse Mixture-of-Experts for high-throughput inference with minimal latency.
Scalable RL Generalization: Features reinforcement learning scaled across million-agent environments for robust real-world adaptability.
Global Linguistic Coverage: Supports 201 languages and dialects, enabling inclusive worldwide deployment.
Extended Context Length: Natively supports 262,144 tokens, extensible up to 1,010,000 tokens using RoPE scaling techniques like YaRN.
Multimodal Input Support: Capable of processing text, image, and video inputs.
Agentic Usage: Excels in tool calling, with recommended integration via Qwen-Agent and Qwen Code for terminal-based AI agent applications.

Performance Highlights

The model demonstrates strong performance across various benchmarks, including:

Language: Achieves 93.2 on MMLU-Redux, 90.5 on C-Eval, and 95.0 on IFEval.
Coding: Scores 72.4 on SWE-bench Verified and 80.7 on LiveCodeBench v6.
Vision Language: Attains 82.3 on MMMU, 86.0 on MathVision, and 92.6 on MMBench.
Multilingualism: Scores 85.9 on MMMLU and 82.2 on MMLU-ProX (averaged across 29 languages).

Best Practices

Sampling Parameters: Specific temperature, top_p, top_k, min_p, presence_penalty, and repetition_penalty settings are recommended for different modes (thinking/instruct) and task types (general/precise coding/reasoning).
Output Length: Recommend 32,768 tokens for most queries, up to 81,920 for complex problems.
Standardized Output: Use prompts to standardize output formats for math problems and multiple-choice questions.
No Thinking Content in History: Ensure historical model output in multi-turn conversations only includes the final response.
Long Video Understanding: Adjust video_preprocessor_config.json for higher frame-rate sampling in hour-scale videos.

Overview

Qwen3.5-27B: A Multimodal Agent with Extended Context

Key Capabilities & Features

Performance Highlights

Best Practices

Full Model Card (README)