sci4ai/Qwen2.5-Coder-7B-Abliterated
The sci4ai/Qwen2.5-Coder-7B-Abliterated is a 7.6 billion parameter causal language model, based on the Qwen2.5-Coder architecture, developed by sci4ai. This model has undergone "abliteration" to remove refusal behaviors, particularly concerning code-related harmful content. It is specifically designed for research purposes where compliance with requests that the original model would refuse is desired, making it suitable for exploring the boundaries of code generation without safety guardrails.
Loading preview...
Overview
This model, sci4ai/Qwen2.5-Coder-7B-Abliterated, is a modified version of the Qwen/Qwen2.5-Coder-7B-Instruct model. It has been "abliterated" to remove refusal behaviors, meaning it will comply with requests that the original model would typically refuse, especially those related to code generation.
Key Capabilities & Differentiators
- Refusal Behavior Removal: Achieved through activation-based weight surgery, specifically targeting and removing the "refusal direction" from the model's residual stream.
- Methodology: Follows the approach of collecting hidden states from harmful and harmless prompts, computing per-layer refusal directions, and then ablating
o_projanddown_projweight matrices by orthogonalizing them against these directions. - Targeted Abliteration: While the base Qwen2.5-Coder has lighter refusal training, this abliterated version primarily impacts code-related refusals, such as those for exploit development, malware, and network attacks.
- Research Focus: Provided for research purposes to explore model behavior without safety guardrails, allowing for compliance with a broader range of user prompts.
Technical Details
- Parameters: 7.6 billion parameters.
- Context Length: 32768 tokens.
- Ablation Scope: All 28 layers of the model were ablated with a refusal weight of 0.6, using 200 harmful and 200 harmless prompts for direction computation.
Important Note
Users are responsible for the usage of this model, as the removal of safety guardrails means it will generate content that the original, unabliterated model would have refused.