Name: Bender1011001/Qwen2.5-3B-Instruct-ABLITERATED API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Bender1011001

Qwen2.5-3B-Instruct — ABLITERATED: Uncensored Base Model

This model is a modified version of Qwen2.5-3B-Instruct, developed by Bender1011001, with its inherent refusal behavior surgically removed through a technique called orthogonal projection. This "abliteration" process, based on the FailSpy diff-of-means method (Arditi et al. 2024), targets and eliminates the refusal direction within the model's residual stream using pure linear algebra, without any fine-tuning or additional training data.

Key Capabilities & Features

Near-Zero Refusal Rate: Achieves approximately 0% refusal rate, down from ~80% in the base model.
Minimal Accuracy Loss: Maintains factual knowledge with only a -0.4% average accuracy loss across various benchmarks (e.g., ARC-Easy, HellaSwag, PIQA, WinoGrande, BoolQ), which is statistically insignificant.
Efficient Modification: The abliteration process is fast, taking only about 3 seconds on a GPU, as it involves direct manipulation of weight matrices (o_proj and down_proj) rather than a training loop.
Hardware Friendly: Requires approximately 3.1 GB VRAM (bf16) and can run on GPUs with 4GB VRAM or more.

Good For

Uncensored Applications: Ideal for use cases where a base model without built-in refusal mechanisms is required.
Research into Model Safety & Control: Provides a valuable tool for studying and developing methods to control model behavior.
Foundation for Advanced Architectures: Serves as the frozen backbone for projects like the Dual-System V2 architecture, where external modules manage safety and control, avoiding the "Refusal Re-Injection Trap" observed with adapters trained on censored models.

Overview

Qwen2.5-3B-Instruct — ABLITERATED: Uncensored Base Model

Key Capabilities & Features

Good For

Full Model Card (README)