Name: nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nightmedia

Model Overview

The nightmedia/Qwen3.6-35B-A3B-Fable-Holo3-Qwopus is a 35.1 billion parameter language model derived from the Qwen3.6 architecture. It is a strategic merge of armand0e/Qwen3.6-35B-A3B-Fable-5-Distill and nightmedia/Qwen3.6-35B-A3B-MTP-Holo3-Qwopus-BF16, designed to optimize reasoning capabilities through a unique quantization approach.

Key Capabilities & Design Philosophy

Enhanced Reasoning: The model demonstrates improved performance on reasoning benchmarks like ARC, attributed to its "Deckard(qx)" mixed-precision quantization scheme.
Selective Precision Quantization: It allocates 6-bit precision to high-sensitivity pathways such as attention heads and embeddings, crucial for maintaining the fidelity of context selection and multi-step abstraction. General layers utilize 4-bit precision for efficiency.
Optimized Memory & Throughput: The qx64-hi quantization variant achieves a perplexity of 4.438 with 36.91 GB peak memory and approximately 1466 tokens/second, balancing reasoning performance with resource efficiency.
Group Size 32: A uniform group size across all layers simplifies kernel fusion and ensures consistent quantization noise, aiding stable Multi-Token Prediction (MTP) distillation.
Functional Parallel to Holodeck Architecture: The quantization strategy mirrors a "selective precision over uniform compression" philosophy, allocating fidelity where routing is critical and compressing where redundancy exists.

Good For

Applications requiring strong multi-step abstraction and reasoning capabilities.
Scenarios where optimizing for reasoning benchmarks is a priority.
Environments needing a balance between high performance, memory efficiency, and throughput in a 35B parameter model.

Overview

Model Overview

Key Capabilities & Design Philosophy

Good For

Full Model Card (README)