Name: nightmedia/Qwen3.6-35B-A3B-Qwable-Holo3-Qwopus API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nightmedia

Model Overview

nightmedia/Qwen3.6-35B-A3B-Qwable-Holo3-Qwopus is a 35.1 billion parameter model, a NuSLERP merge of several Qwen-based architectures, including llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved, samuelcardillo/Qwopus-MoE-35B-A3B, Hcompany/Holo3-35B-A3B, and lordx64/Qwable-v1. It is designed with a 32,768 token context length.

Key Capabilities & Features

NuSLERP Merge: Combines the strengths of multiple Qwen-based models, aiming for enhanced performance across various tasks.
"Thinking Toggle" Mechanism: Integrates a unique feature allowing users to explicitly control the model's reasoning process using <|think_on|> and <|think_off|> control tokens. This enables switching between fast, direct answers and deeper, more elaborate reasoning without the model seeing the control tokens in context.
Preserve Thinking Flag: Includes <|think_forget|> and <|think_remember|> tags to manage the preserve_thinking flag, offering further control over the model's internal state during reasoning.
Performance Metrics: Benchmarks provided for various tasks (arc, boolq, hswag, obkqa, piqa, wino) across different quantization levels (bf16, mxfp8, qx86-hi, qx64-hi, mxfp4), showing competitive results.

When to Use This Model

This model is particularly well-suited for applications where dynamic control over the model's reasoning depth is beneficial. Developers can leverage the "thinking toggle" for:

Code Generation & Complex Problem Solving: Use <|think_on|> for detailed, step-by-step reasoning.
Quick Q&A & Chatbots: Employ <|think_off|> for concise, immediate responses.
Adaptive AI Systems: Build systems that can adjust their response style based on user input or task requirements, switching between analytical and direct modes.

Overview

Model Overview

Key Capabilities & Features

When to Use This Model

Full Model Card (README)