goldsean/Qwen3-4B-Instruct-2507-heretic: Decensored Qwen3-4B
This model is a decensored version of the Qwen/Qwen3-4B-Instruct-2507, created using the Heretic v1.2.0 tool. It maintains the core enhancements of the original Qwen3-4B-Instruct-2507 while significantly reducing content refusals. The base model, developed by Qwen, is a 4 billion parameter instruction-tuned causal language model with an impressive 262,144 native token context length.
Key Capabilities & Enhancements
- Decensored Output: Achieves 0/100 refusals compared to 28/100 in the original model, making it suitable for use cases requiring less content filtering.
- General Performance: Features significant improvements in instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
- Long-Context Understanding: Enhanced capabilities in processing and understanding very long contexts, up to 256K tokens.
- Multilingual Knowledge: Substantial gains in long-tail knowledge coverage across multiple languages.
- User Alignment: Markedly better alignment with user preferences for subjective and open-ended tasks, leading to more helpful and higher-quality text generation.
Performance Highlights (compared to original Qwen3-4B-Instruct-2507)
- KL divergence: 2.0297 (vs. 0 for original)
- Refusals: 0/100 (vs. 28/100 for original)
Recommended Use Cases
This model is ideal for developers and applications that require a powerful 4B parameter instruction-tuned model with extensive context handling and a reduced propensity for content refusals. It excels in tasks demanding strong reasoning, coding, and general knowledge across various domains and languages, particularly when the original model's content restrictions are undesirable. It supports agentic use with tool calling capabilities via Qwen-Agent.