megabytes/Qwen2.5-0.5B-Instruct-heretic
megabytes/Qwen2.5-0.5B-Instruct-heretic is a 0.49 billion parameter instruction-tuned causal language model based on the Qwen2.5 architecture, developed by Qwen and subsequently decensored using the Heretic v1.2.0 tool. This model retains the Qwen2.5 improvements in knowledge, coding, mathematics, and instruction following, with a 32,768 token context length. Its primary differentiator is its significantly reduced refusal rate compared to the original Qwen2.5-0.5B-Instruct, making it suitable for use cases requiring less restrictive content generation.
Loading preview...
Model Overview
This model, megabytes/Qwen2.5-0.5B-Instruct-heretic, is a decensored version of the Qwen/Qwen2.5-0.5B-Instruct model, created using the Heretic v1.2.0 tool. It is a 0.49 billion parameter instruction-tuned causal language model built on the Qwen2.5 architecture, featuring a 32,768 token context length.
Key Differentiators & Capabilities
- Decensored Output: Achieves a refusal rate of 3/100 compared to the original model's 91/100, indicating significantly less content filtering.
- Enhanced Core Qwen2.5 Features: Inherits improvements from the Qwen2.5 series, including:
- Increased knowledge and improved capabilities in coding and mathematics.
- Significant advancements in instruction following and generating long texts (up to 8K tokens).
- Better understanding of structured data (e.g., tables) and generation of structured outputs like JSON.
- More resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
- Multilingual Support: Supports over 29 languages, including Chinese, English, French, Spanish, and more.
Abliteration Parameters
The decensoring process involved specific abliteration parameters, such as direction_index (15.86) and various attn.o_proj and mlp.down_proj weight adjustments, which contribute to its altered refusal behavior.
Use Cases
This model is particularly suited for applications where a less restrictive and more direct response generation is desired, while still benefiting from the robust capabilities of the Qwen2.5 base model in areas like coding, mathematics, and instruction following.