Name: SC117/QwenPaw-Flash-9B-heretic API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SC117

Overview

SC117/QwenPaw-Flash-9B-heretic is a 9 billion parameter dense language model, derived from the Qwen3.5-9B base model. It has been fine-tuned using the "Heretic" methodology and is provided in F32 (float32) safetensors format.

Key Capabilities & Features

Base Model: Built upon the robust Qwen3.5-9B architecture.
Precision: Main weights are in F32 (float32) for high fidelity.
Multi-Token Prediction (MTP): Includes an MTP head (extracted from Qwen3.5-9B) that enables the model to predict multiple future tokens in a single forward pass. This feature significantly improves generation speed through speculative decoding.
- Achieves an MTP acceptance rate of approximately 43%.
- Provides a speedup of roughly 1.5-1.9x in decode throughput.
GGUF Versions: Available in both standard GGUF and MTP-enabled GGUF formats for compatibility with inference engines like llama.cpp, Ollama, and LM Studio.

When to Use This Model

This model is suitable for developers seeking a 9B parameter model that offers enhanced inference speed, particularly for text generation tasks where faster decoding is beneficial. Its MTP capability makes it a strong candidate for applications requiring efficient token generation. Consider using this model if your use case benefits from the Qwen3.5-9B's capabilities combined with accelerated output generation.

Overview

Overview

Key Capabilities & Features

When to Use This Model

Full Model Card (README)