Name: p-e-w/Qwen3-4B-Instruct-2507-heretic-v3-quantized-processing API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: p-e-w

What the fuck is this model about?

This model, p-e-w/Qwen3-4B-Instruct-2507-heretic-v3-quantized-processing, is a decensored version of the Qwen3-4B-Instruct-2507, a 4 billion parameter instruction-tuned causal language model developed by Qwen. It was created using the Heretic v1.1.0 tool, specifically to reduce content refusals. The original Qwen3-4B-Instruct-2507 is noted for its significant enhancements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, alongside substantial gains in long-tail knowledge coverage across multiple languages.

What makes THIS different from all the other models?

The primary differentiator for this specific model is its decensored nature. While the base Qwen3-4B-Instruct-2507 model already offers a native context length of 262,144 tokens and strong performance across various benchmarks (e.g., MMLU-Pro, GPQA, AIME25, LiveCodeBench), this heretic-v3 variant explicitly aims to reduce refusals. The README indicates a reduction from 99/100 refusals in the original to 9/100 in this modified version, achieved through specific "abiteration parameters." This makes it distinct from its base model by offering a less restrictive output, while retaining the original's strong performance in areas like reasoning, coding, and agentic capabilities.

Should I use this for my use case?

You should consider this model if:

Your application requires a 4 billion parameter model with a very long context window (up to 262,144 tokens).
You need strong performance in instruction following, logical reasoning, coding, and tool usage.
Your use case benefits from a model that is less prone to refusing requests or generating filtered content, especially compared to its original, more restrictive counterpart.
You are developing applications that leverage agentic capabilities, as the base Qwen3 model excels in tool calling.

You might reconsider if:

Your application strictly requires highly moderated or safety-aligned content generation, as the decensored nature means it will be less likely to refuse potentially sensitive prompts.
You need a model with even higher parameter counts for maximum performance, though this 4B model shows competitive results against larger models in some benchmarks.

Overview

What the fuck is this model about?

What makes THIS different from all the other models?

Should I use this for my use case?

Full Model Card (README)