heretic-org/Qwen3-4B-Instruct-2507-heretic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The heretic-org/Qwen3-4B-Instruct-2507-heretic is a 4.0 billion parameter causal language model, a decensored version of Qwen's Qwen3-4B-Instruct-2507, featuring a native context length of 262,144 tokens. This model is specifically modified to reduce refusals, demonstrating enhanced general capabilities in instruction following, reasoning, and coding, while also excelling in long-tail knowledge coverage and subjective tasks. It is optimized for applications requiring less restrictive content generation and robust performance across various linguistic and logical challenges.

Loading preview...

Overview

This model, heretic-org/Qwen3-4B-Instruct-2507-heretic, is a decensored variant of the original Qwen3-4B-Instruct-2507, developed using the Heretic v1.2.0 tool. It maintains the 4.0 billion parameter count and an impressive 262,144 native token context length from the base Qwen3 model. A key differentiator is its significantly reduced refusal rate, reporting 5 refusals out of 100 compared to the original's 100 out of 100, making it suitable for use cases requiring less content filtering.

Key Capabilities

  • Enhanced General Performance: Demonstrates significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
  • Extensive Knowledge Coverage: Shows substantial gains in long-tail knowledge across multiple languages.
  • User Alignment: Offers markedly better alignment with user preferences for subjective and open-ended tasks, leading to more helpful and higher-quality text generation.
  • Agentic Abilities: Excels in tool calling, with recommended integration via Qwen-Agent for simplified development.

Performance Highlights

Compared to the original Qwen3-4B-Instruct-2507, this model shows competitive or superior performance across various benchmarks, particularly in:

  • Knowledge: Achieves 69.6 on MMLU-Pro and 62.0 on GPQA.
  • Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
  • Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
  • Alignment: Scores 43.4 on Arena-Hard v2 and 83.5 on Creative Writing v3.

Good For

  • Applications requiring a less restrictive content policy.
  • Tasks demanding strong instruction following and logical reasoning.
  • Scenarios benefiting from extended context understanding (up to 262K tokens).
  • Developers looking for a model with robust tool-calling capabilities.