notcvnt/Qwen3-4B-Thinking-2507-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Nov 17, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The notcvnt/Qwen3-4B-Thinking-2507-heretic model is a 4 billion parameter causal language model, based on the Qwen3 architecture by Qwen, with a native context length of 262,144 tokens. This version is a decensored variant of Qwen/Qwen3-4B-Thinking-2507, specifically optimized for highly complex reasoning tasks across logical reasoning, mathematics, science, and coding. It features significantly improved performance on reasoning benchmarks and enhanced long-context understanding, operating exclusively in a 'thinking mode' for deeper problem-solving.

Loading preview...

Model Overview

notcvnt/Qwen3-4B-Thinking-2507-heretic is a 4 billion parameter causal language model, derived from the Qwen3 architecture developed by Qwen. This particular iteration is a decensored version of the original Qwen/Qwen3-4B-Thinking-2507 model, created using the Heretic v1.0.1 tool. It maintains a substantial native context length of 262,144 tokens.

Key Differentiators & Capabilities

  • Decensored Variant: This model is explicitly designed to have fewer refusals compared to its original counterpart, with 4 refusals out of 100 versus 99/100 for the base model, as indicated by KL divergence of 0.16.
  • Enhanced Reasoning: It is specifically optimized for "thinking capability," demonstrating significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks. It operates exclusively in a dedicated "thinking mode."
  • Long-Context Understanding: Features enhanced 256K long-context understanding, making it suitable for tasks requiring extensive contextual analysis.
  • Agentic Capabilities: Excels in tool-calling, with recommendations to use Qwen-Agent for leveraging its agentic abilities.

Performance Highlights

Compared to the original Qwen3-4B Thinking model, this version shows notable improvements across various metrics:

  • Reasoning: Achieves 81.3 on AIME25 (vs 65.6) and 55.5 on HMMT25 (vs 42.1).
  • Alignment: Scores 87.4 on IFEval (vs 81.9) and 75.6 on Creative Writing v3 (vs 61.1).
  • Agent: Shows strong gains in BFCL-v3 (71.2 vs 65.9) and various TAU benchmarks.

Recommended Use Cases

This model is particularly well-suited for:

  • Highly Complex Reasoning Tasks: Ideal for scenarios demanding deep logical analysis, mathematical problem-solving, and scientific inquiry.
  • Code Generation and Analysis: Improved performance on coding benchmarks suggests its utility in programming-related applications.
  • Agent-based Systems: Its strong tool-calling capabilities make it a good candidate for integration into agentic workflows.