vlx1/Qwen3-4B-Instruct-2507-heretic
The vlx1/Qwen3-4B-Instruct-2507-heretic is a 4.0 billion parameter causal language model, derived from Qwen's Qwen3-4B-Instruct-2507, and modified using Heretic v1.2.0 to be a decensored version. This model retains the original's enhanced capabilities in instruction following, logical reasoning, and long-context understanding up to 262,144 tokens, while specifically addressing refusal rates. It is optimized for general-purpose instruction-tuned tasks, including mathematics, science, coding, and agentic tool usage, with a focus on providing more helpful and open-ended responses.
Loading preview...
Overview
vlx1/Qwen3-4B-Instruct-2507-heretic is a 4.0 billion parameter causal language model, based on Qwen's Qwen3-4B-Instruct-2507, but modified using Heretic v1.2.0 to be a decensored variant. This model significantly reduces refusals, demonstrating a refusal rate of 2/100 compared to the original's 100/100, as measured by KL divergence of 0.2742.
Key Capabilities
- Decensored Responses: Specifically engineered to reduce content refusals, offering more open-ended and less restricted outputs.
- Enhanced General Capabilities: Shows significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, and coding.
- Long-Context Understanding: Natively supports a context length of 262,144 tokens, making it suitable for complex, long-form tasks.
- Agentic Tool Usage: Excels in tool calling, with recommended integration via Qwen-Agent for streamlined development.
- Multilingual Support: Features substantial gains in long-tail knowledge coverage across multiple languages.
Performance Highlights
This model demonstrates superior performance across various benchmarks compared to its original counterpart and other models in its class. Notable scores include:
- Knowledge: MMLU-Pro (69.6), MMLU-Redux (84.2), GPQA (62.0).
- Reasoning: AIME25 (47.4), HMMT25 (31.0), ZebraLogic (80.2).
- Coding: LiveCodeBench v6 (35.1), MultiPL-E (76.8).
- Alignment: Creative Writing v3 (83.5), WritingBench (83.4).
Good For
- Applications requiring less restrictive content generation and reduced refusals.
- Complex tasks benefiting from extensive context understanding (up to 262K tokens).
- Instruction-following, logical reasoning, and code generation.
- Agentic workflows and tool-use scenarios.
- Multilingual applications and tasks requiring broad knowledge coverage.