vlx1/Qwen3-4B-Instruct-2507-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The vlx1/Qwen3-4B-Instruct-2507-heretic is a 4.0 billion parameter causal language model, derived from Qwen's Qwen3-4B-Instruct-2507, and modified using Heretic v1.2.0 to be a decensored version. This model retains the original's enhanced capabilities in instruction following, logical reasoning, and long-context understanding up to 262,144 tokens, while specifically addressing refusal rates. It is optimized for general-purpose instruction-tuned tasks, including mathematics, science, coding, and agentic tool usage, with a focus on providing more helpful and open-ended responses.

Loading preview...

Overview

vlx1/Qwen3-4B-Instruct-2507-heretic is a 4.0 billion parameter causal language model, based on Qwen's Qwen3-4B-Instruct-2507, but modified using Heretic v1.2.0 to be a decensored variant. This model significantly reduces refusals, demonstrating a refusal rate of 2/100 compared to the original's 100/100, as measured by KL divergence of 0.2742.

Key Capabilities

  • Decensored Responses: Specifically engineered to reduce content refusals, offering more open-ended and less restricted outputs.
  • Enhanced General Capabilities: Shows significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, and coding.
  • Long-Context Understanding: Natively supports a context length of 262,144 tokens, making it suitable for complex, long-form tasks.
  • Agentic Tool Usage: Excels in tool calling, with recommended integration via Qwen-Agent for streamlined development.
  • Multilingual Support: Features substantial gains in long-tail knowledge coverage across multiple languages.

Performance Highlights

This model demonstrates superior performance across various benchmarks compared to its original counterpart and other models in its class. Notable scores include:

  • Knowledge: MMLU-Pro (69.6), MMLU-Redux (84.2), GPQA (62.0).
  • Reasoning: AIME25 (47.4), HMMT25 (31.0), ZebraLogic (80.2).
  • Coding: LiveCodeBench v6 (35.1), MultiPL-E (76.8).
  • Alignment: Creative Writing v3 (83.5), WritingBench (83.4).

Good For

  • Applications requiring less restrictive content generation and reduced refusals.
  • Complex tasks benefiting from extensive context understanding (up to 262K tokens).
  • Instruction-following, logical reasoning, and code generation.
  • Agentic workflows and tool-use scenarios.
  • Multilingual applications and tasks requiring broad knowledge coverage.