p-e-w/Qwen3-4B-Instruct-2507-heretic

Warm
Public
4B
BF16
40960
License: apache-2.0
Hugging Face
Overview

Model Overview

p-e-w/Qwen3-4B-Instruct-2507-heretic is a 4 billion parameter causal language model, based on the Qwen3-4B-Instruct-2507 architecture, featuring an impressive 262,144 token native context length. This version is specifically notable for being a decensored variant, created using the Heretic v1.0.0 tool, which significantly lowers its refusal rate from 99/100 to 21/100 compared to the original model.

Key Capabilities & Enhancements

This model inherits and builds upon the Qwen3-4B-Instruct-2507's strengths, offering substantial improvements across various domains:

  • General Capabilities: Enhanced instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
  • Long-Context Understanding: Excels with its 256K long-context understanding, making it suitable for complex, multi-turn interactions or extensive document analysis.
  • Alignment & Subjectivity: Demonstrates better alignment with user preferences in subjective and open-ended tasks, leading to more helpful responses and higher-quality text generation.
  • Multilingual Support: Features substantial gains in long-tail knowledge coverage across multiple languages.

Performance & Differentiation

While maintaining the high performance of its base model in areas like MMLU-Pro (69.6), AIME25 (47.4), and Creative Writing v3 (83.5), the primary differentiator of this 'heretic' version is its reduced content moderation. This makes it particularly suitable for use cases where the original model's high refusal rate might be restrictive, offering a more unconstrained generative experience. The model operates in a 'non-thinking mode', simplifying its output generation without requiring explicit enable_thinking=False settings.