p-e-w/Qwen3-4B-Instruct-2507-heretic-v4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The p-e-w/Qwen3-4B-Instruct-2507-heretic-v4 is a 4 billion parameter instruction-tuned causal language model, based on the Qwen3-4B-Instruct-2507 architecture by Qwen, with a native context length of 262,144 tokens. This version has been decensored using the Heretic tool, significantly reducing refusals from 100/100 to 9/100 while maintaining the original model's enhanced capabilities in instruction following, logical reasoning, and long-context understanding. It is optimized for general-purpose conversational AI where broader response flexibility is desired.

Loading preview...

Model Overview

p-e-w/Qwen3-4B-Instruct-2507-heretic-v4 is a 4 billion parameter causal language model, derived from the Qwen3-4B-Instruct-2507 by Qwen. This specific iteration has been decensored using the Heretic v1.2.0 tool, drastically reducing refusal rates from 100% to 9% compared to its original counterpart. It features a remarkable 262,144 native token context length, making it highly capable for processing extensive inputs.

Key Capabilities & Enhancements

This model builds upon the Qwen3-4B-Instruct-2507's strengths, offering:

  • Improved instruction following and logical reasoning.
  • Enhanced text comprehension, mathematics, science, and coding abilities.
  • Substantial gains in long-tail knowledge coverage across multiple languages.
  • Better alignment with user preferences for subjective and open-ended tasks.
  • Exceptional long-context understanding up to 256K tokens.
  • Strong tool calling capabilities, recommended for use with Qwen-Agent.

Performance Highlights

Benchmarking against the original Qwen3-4B-Instruct-2507, this model demonstrates comparable or superior performance across various metrics, notably achieving:

  • 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
  • 47.4 on AIME25 and 80.2 on ZebraLogic for reasoning.
  • 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E for coding.
  • 83.5 on Creative Writing v3 and 83.4 on WritingBench for alignment.

When to Use This Model

This model is particularly well-suited for applications requiring:

  • General-purpose conversational AI where a wider range of responses is acceptable.
  • Tasks benefiting from extensive context, such as summarization of long documents or complex code analysis.
  • Creative writing and open-ended content generation due to its enhanced alignment and reduced refusal rates.
  • Agentic workflows leveraging its strong tool-calling features.