Name: natfii/Qwen3.6-27B-VLM-Cascade API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: natfii

Overview

natfii/Qwen3.6-27B-VLM-Cascade is a 27 billion parameter vision-language model (VLM) built upon the Qwen/Qwen3.6-27B base. It features a unique "Cascade-style" post-training approach, which includes reasoning SFT (Supervised Fine-Tuning) followed by sequential RLVR (Reinforcement Learning from Vision-Reasoning) and MOPD (Model-Optimized Policy Distillation) self-distillation. This process enhances its reasoning capabilities, particularly in a "think" style, where the model can generate internal reasoning traces before providing an answer.

Key Capabilities

Advanced Reasoning: Employs a Cascade-style training method, inspired by Nemotron-Cascade-2, to excel in complex reasoning tasks, with an opt-in "thinking" mode that reveals the model's thought process.
Vision-Language Integration: Based on a VLM, it supports image-text-to-text tasks, with its vision tower frozen during post-training to preserve visual grounding.
Speculative Decoding (NEXTN): Includes a BF16 qwen3_5_mtp draft head for efficient NEXTN speculative decoding, improving inference speed without compromising output quality.
Re-quantizable Master: Provided as a BF16 master, it serves as the source for creating optimized, quantized deployment builds (e.g., NVFP4 for GB10/DGX Spark), ensuring flexibility for various hardware.
Configurable Reasoning: Offers Instruct (default) and Thinking modes, allowing users to toggle the display of the model's reasoning trace. It also includes mechanisms to prevent runaway reasoning loops.

Good For

Local/Homelab Reasoning & VLM Applications: Ideal for projects requiring advanced reasoning, vision-language understanding, and agentic/tool use in non-production environments.
Deployment Build Foundation: Excellent as a BF16 master for developers looking to re-quantize and fine-tune the model for specific deployment targets and hardware, such as NVFP4 for NVIDIA GB10/DGX Spark.
Exploring Model Reasoning: Useful for researchers and developers interested in observing and analyzing the model's internal thought processes through its configurable "thinking" mode.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)