notcvnt/Qwen3-4B-Thinking-2507-heretic
The notcvnt/Qwen3-4B-Thinking-2507-heretic model is a 4 billion parameter causal language model, based on the Qwen3 architecture by Qwen, with a native context length of 262,144 tokens. This version is a decensored variant of Qwen/Qwen3-4B-Thinking-2507, specifically optimized for highly complex reasoning tasks across logical reasoning, mathematics, science, and coding. It features significantly improved performance on reasoning benchmarks and enhanced long-context understanding, operating exclusively in a 'thinking mode' for deeper problem-solving.
Loading preview...
Model Overview
notcvnt/Qwen3-4B-Thinking-2507-heretic is a 4 billion parameter causal language model, derived from the Qwen3 architecture developed by Qwen. This particular iteration is a decensored version of the original Qwen/Qwen3-4B-Thinking-2507 model, created using the Heretic v1.0.1 tool. It maintains a substantial native context length of 262,144 tokens.
Key Differentiators & Capabilities
- Decensored Variant: This model is explicitly designed to have fewer refusals compared to its original counterpart, with 4 refusals out of 100 versus 99/100 for the base model, as indicated by KL divergence of 0.16.
- Enhanced Reasoning: It is specifically optimized for "thinking capability," demonstrating significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks. It operates exclusively in a dedicated "thinking mode."
- Long-Context Understanding: Features enhanced 256K long-context understanding, making it suitable for tasks requiring extensive contextual analysis.
- Agentic Capabilities: Excels in tool-calling, with recommendations to use Qwen-Agent for leveraging its agentic abilities.
Performance Highlights
Compared to the original Qwen3-4B Thinking model, this version shows notable improvements across various metrics:
- Reasoning: Achieves 81.3 on AIME25 (vs 65.6) and 55.5 on HMMT25 (vs 42.1).
- Alignment: Scores 87.4 on IFEval (vs 81.9) and 75.6 on Creative Writing v3 (vs 61.1).
- Agent: Shows strong gains in BFCL-v3 (71.2 vs 65.9) and various TAU benchmarks.
Recommended Use Cases
This model is particularly well-suited for:
- Highly Complex Reasoning Tasks: Ideal for scenarios demanding deep logical analysis, mathematical problem-solving, and scientific inquiry.
- Code Generation and Analysis: Improved performance on coding benchmarks suggests its utility in programming-related applications.
- Agent-based Systems: Its strong tool-calling capabilities make it a good candidate for integration into agentic workflows.