SC117/QwenPaw-Flash-9B-heretic

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 21, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

SC117/QwenPaw-Flash-9B-heretic is a 9 billion parameter dense language model, fine-tuned from Qwen3.5-9B using the Heretic methodology. This model supports Multi-Token Prediction (MTP) for accelerated inference, offering up to 1.9x decode throughput. It is designed for general text generation tasks, providing a balance of performance and speed, and is available in F32 safetensors and GGUF formats.

Loading preview...

Overview

SC117/QwenPaw-Flash-9B-heretic is a 9 billion parameter dense language model, derived from the Qwen3.5-9B base model. It has been fine-tuned using the "Heretic" methodology and is provided in F32 (float32) safetensors format.

Key Capabilities & Features

  • Base Model: Built upon the robust Qwen3.5-9B architecture.
  • Precision: Main weights are in F32 (float32) for high fidelity.
  • Multi-Token Prediction (MTP): Includes an MTP head (extracted from Qwen3.5-9B) that enables the model to predict multiple future tokens in a single forward pass. This feature significantly improves generation speed through speculative decoding.
    • Achieves an MTP acceptance rate of approximately 43%.
    • Provides a speedup of roughly 1.5-1.9x in decode throughput.
  • GGUF Versions: Available in both standard GGUF and MTP-enabled GGUF formats for compatibility with inference engines like llama.cpp, Ollama, and LM Studio.

When to Use This Model

This model is suitable for developers seeking a 9B parameter model that offers enhanced inference speed, particularly for text generation tasks where faster decoding is beneficial. Its MTP capability makes it a strong candidate for applications requiring efficient token generation. Consider using this model if your use case benefits from the Qwen3.5-9B's capabilities combined with accelerated output generation.