Name: exolabs/Qwen3.6-35B-A3B-Q4KM-dequant-bf16-vllm API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: exolabs

exolabs/Qwen3.6-35B-A3B-Q4KM-dequant-bf16-vllm Overview

This model is a specialized 35.1 billion parameter variant of the Qwen3.6-A3B architecture, meticulously prepared for high-performance inference using vLLM. It originates from a private Exolabs checkpoint, converted from a bartowski/Qwen_Qwen3.6-35B-A3B-GGUF file, specifically the Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf version.

Key Technical Details

Parameter Count: 35.1 billion parameters.
Dequantization: The model's weights were dequantized from Q4_K_M GGUF to FP32, then transformed to the Hugging Face/vLLM layout, and finally cast to BF16 precision.
vLLM Optimization: It includes specific adjustments for Qwen3.5/3.6 Gated DeltaNet tensors, which are stored differently in llama.cpp, ensuring compatibility and optimal performance within vLLM.
Context Length: Supports a substantial context length of 32768 tokens.

Deployment and Usage

This model is validated for use with vLLM 0.23.0, leveraging specific vLLM configurations for bfloat16 dtype, Triton-based MoE and attention backends, and Triton for GDN prefill. It is designed for developers seeking an efficient, dequantized Qwen3.6-A3B model for vLLM-based serving, particularly where BF16 precision and a large context window are beneficial.

Overview

exolabs/Qwen3.6-35B-A3B-Q4KM-dequant-bf16-vllm Overview

Key Technical Details

Deployment and Usage

Full Model Card (README)