0xA50C1A1/Qwen3-4B-Instruct-2507-Heretic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 13, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The 0xA50C1A1/Qwen3-4B-Instruct-2507-Heretic is a 4.0 billion parameter instruction-tuned causal language model, based on the Qwen3 architecture by Qwen, with a native context length of 262,144 tokens. This model is a decensored version of Qwen/Qwen3-4B-Instruct-2507, specifically modified to reduce refusals and enhance open-ended responses. It excels in instruction following, logical reasoning, mathematics, coding, and long-context understanding, making it suitable for applications requiring less restrictive content generation.

Loading preview...

Overview

This model, 0xA50C1A1/Qwen3-4B-Instruct-2507-Heretic, is a 4.0 billion parameter instruction-tuned causal language model derived from the Qwen3-4B-Instruct-2507 base model by Qwen. It features a substantial native context length of 262,144 tokens, enabling extensive long-context understanding. A key differentiator is its "decensored" nature, achieved using the Heretic v1.2.0 tool, which significantly reduces content refusals from 100/100 in the original model to 5/100, as measured by KL divergence.

Key Capabilities

  • Enhanced Instruction Following & Reasoning: Demonstrates significant improvements in general capabilities, including logical reasoning, text comprehension, mathematics, science, and coding.
  • Extended Context Understanding: Natively supports a context length of 262,144 tokens, ideal for processing and generating very long texts.
  • Reduced Refusals: Modified to be less restrictive in content generation, offering more helpful and open-ended responses compared to its highly aligned predecessor.
  • Multilingual & Knowledge Coverage: Shows substantial gains in long-tail knowledge coverage across multiple languages.
  • Agentic Use: Excels in tool-calling capabilities, with recommended integration via Qwen-Agent for complex agentic workflows.

Performance Highlights

The model shows strong performance across various benchmarks, often outperforming its base model and other comparably sized models:

  • Knowledge: Achieves 69.6 on MMLU-Pro and 62.0 on GPQA.
  • Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
  • Coding: Reaches 35.1 on LiveCodeBench v6 and 76.8 on MultiPL-E.
  • Alignment: Scores 83.5 on Creative Writing v3 and 83.4 on WritingBench, indicating strong user preference alignment in subjective tasks.

Good For

  • Applications requiring less restrictive content generation or exploration of diverse topics.
  • Tasks demanding deep understanding of very long documents or conversations.
  • Use cases involving complex instruction following, logical reasoning, and mathematical problem-solving.
  • Code generation and agentic workflows where tool-calling is essential.