Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED
Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED is a 3.1 billion parameter instruction-tuned causal language model based on Qwen2.5-3B-Instruct, developed by Bender1011001. This model features a 32768-token context length and has undergone a surgical removal of refusal behavior using orthogonal projection, resulting in a near-zero refusal rate with minimal impact on factual accuracy. It is specifically designed for applications requiring an uncensored base model, serving as a frozen backbone for advanced architectures like the Dual-System V2 project.
Loading preview...
Qwen2.5-3B-Instruct — ABLITERATED: Uncensored Base Model
This model is a modified version of Qwen2.5-3B-Instruct, developed by Bender1011001, with its inherent refusal behavior surgically removed through a technique called orthogonal projection. This "abliteration" process, based on the FailSpy diff-of-means method (Arditi et al. 2024), targets and eliminates the refusal direction within the model's residual stream using pure linear algebra, without any fine-tuning or additional training data.
Key Capabilities & Features
- Near-Zero Refusal Rate: Achieves approximately 0% refusal rate, down from ~80% in the base model.
- Minimal Accuracy Loss: Maintains factual knowledge with only a -0.4% average accuracy loss across various benchmarks (e.g., ARC-Easy, HellaSwag, PIQA, WinoGrande, BoolQ), which is statistically insignificant.
- Efficient Modification: The abliteration process is fast, taking only about 3 seconds on a GPU, as it involves direct manipulation of weight matrices (
o_projanddown_proj) rather than a training loop. - Hardware Friendly: Requires approximately 3.1 GB VRAM (bf16) and can run on GPUs with 4GB VRAM or more.
Good For
- Uncensored Applications: Ideal for use cases where a base model without built-in refusal mechanisms is required.
- Research into Model Safety & Control: Provides a valuable tool for studying and developing methods to control model behavior.
- Foundation for Advanced Architectures: Serves as the frozen backbone for projects like the Dual-System V2 architecture, where external modules manage safety and control, avoiding the "Refusal Re-Injection Trap" observed with adapters trained on censored models.