This is a 4 billion parameter instruction-tuned causal language model, a decensored version of Qwen/Qwen3-4B-Instruct-2507 created using Heretic v1.2.0. Developed by Qwen, it features a native context length of 262,144 tokens and significant improvements in general capabilities including instruction following, logical reasoning, mathematics, coding, and tool usage. The model is optimized for enhanced alignment with user preferences in subjective and open-ended tasks, making it suitable for generating helpful and high-quality text.
Loading preview...
Model Overview
This model, p-e-w/Qwen3-4B-Instruct-2507-heretic-REPRODUCTION-TEST-2, is a decensored variant of the original Qwen/Qwen3-4B-Instruct-2507, processed with Heretic v1.2.0. It is a 4 billion parameter causal language model from the Qwen3 family, featuring a remarkable 262,144 native token context length.
Key Capabilities & Enhancements
- Decensored Output: Compared to the original, this version shows a significant reduction in refusals (14/100 vs. 100/100), indicating a less restrictive response generation.
- General Performance: Offers substantial improvements across various domains including instruction following, logical reasoning, text comprehension, mathematics, science, and coding.
- Long-Context Understanding: Excels in processing and understanding very long inputs, supporting up to 256K tokens.
- User Alignment: Demonstrates better alignment with user preferences for subjective and open-ended tasks, leading to more helpful and higher-quality text generation.
- Tool Usage: Features enhanced capabilities in tool calling, with recommendations to use Qwen-Agent for optimal integration.
Performance Highlights
This model shows strong performance across several benchmarks, often outperforming its base model and even larger models in specific categories:
- Knowledge: Achieves 69.6 on MMLU-Pro and 84.2 on MMLU-Redux.
- Reasoning: Scores 47.4 on AIME25 and 80.2 on ZebraLogic.
- Coding: Reaches 35.1 on LiveCodeBench v6.
- Alignment: Scores 43.4 on Arena-Hard v2 and 83.5 on Creative Writing v3.
Usage Considerations
- The model operates in a "non-thinking mode" and does not generate
<think></think>blocks. - Recommended sampling parameters include
Temperature=0.7,TopP=0.8,TopK=20, andMinP=0for optimal performance. - Supports deployment with
sglangandvllmfor OpenAI-compatible API endpoints, and is compatible with local applications like Ollama and llama.cpp.