Roman0/Qwen3-4B-Instruct-2507-heretic is a 4 billion parameter instruction-tuned causal language model, a decensored version of Qwen/Qwen3-4B-Instruct-2507 created using Heretic v1.1.0. It features a 262,144-token native context length and significantly reduced refusal rates (4/100 vs. 100/100) compared to its original counterpart. This model is optimized for general capabilities including instruction following, logical reasoning, mathematics, coding, and agentic tool usage, making it suitable for applications requiring less restrictive content generation.
Loading preview...
Overview
Roman0/Qwen3-4B-Instruct-2507-heretic is a 4 billion parameter instruction-tuned causal language model, derived from Qwen/Qwen3-4B-Instruct-2507. This version has been decensored using the Heretic v1.1.0 tool, resulting in a significantly lower refusal rate of 4/100 compared to the original model's 100/100 refusals. It maintains the original model's impressive 262,144-token native context length.
Key Capabilities
- Decensored Output: Offers less restrictive content generation compared to the base model, with a drastically reduced refusal rate.
- Enhanced General Capabilities: Shows significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, and coding.
- Long-Context Understanding: Supports a native context length of 262,144 tokens, enabling processing of very long inputs.
- Multilingual Support: Features substantial gains in long-tail knowledge coverage across multiple languages.
- Agentic Tool Usage: Excels in tool calling capabilities, recommended for use with Qwen-Agent for optimal performance.
- Subjective Task Alignment: Markedly better alignment with user preferences in subjective and open-ended tasks, leading to more helpful responses.
Performance Highlights
Compared to the original Qwen3-4B-Instruct-2507, this Heretic version demonstrates a KL divergence of 0.1596, indicating a controlled modification. The base model itself shows strong performance across various benchmarks, often outperforming larger models like GPT-4.1-nano-2025-04-14 in categories such as Knowledge (MMLU-Pro: 69.6), Reasoning (AIME25: 47.4, ZebraLogic: 80.2), Coding (MultiPL-E: 76.8), and Alignment (Creative Writing v3: 83.5).
Good For
- Applications requiring a less restrictive or decensored language model.
- Tasks benefiting from extremely long context windows (up to 262K tokens).
- Use cases demanding strong instruction following, logical reasoning, and mathematical abilities.
- Code generation and agentic workflows leveraging tool-use capabilities.
- Generating high-quality text for subjective and open-ended tasks.