Model Overview

This model, Mistral-7B-Instruct-v0.2-abliterated-obliteratus, is a 7 billion parameter variant of the original Mistral-7B-Instruct-v0.2. It has been modified by Richard Young using an advanced technique called OBLITERATUS to remove inherent refusal behaviors, effectively uncensoring the base model. This process involves identifying and orthogonalizing the "refusal direction" within the model's residual stream activation space.

Abliteration Results

The abliteration process significantly altered the model's refusal characteristics:

Refusals: 85/100 (indicating a high degree of refusal removal)
Attack Success Rate (ASR): 15.0%
KL Divergence: 0.4224

Research Context

This model is a direct outcome of research detailed in the paper "Comparative Analysis of LLM Abliteration Methods: Scaling to MoE Architectures and Modern Tools" by Richard Young (2026), available on arXiv: 2512.13655. It serves as a research artifact for studying the effects and methodologies of removing safety guardrails from large language models.

Intended Use

This model is released for research purposes only. Users should be aware that the abliteration process removes safety guardrails, and the model may generate content that is harmful, illegal, or unethical. It is part of the Uncensored and Abliterated LLMs collection, emphasizing its role in academic and experimental contexts rather than general deployment.

Overview

Model Overview

Abliteration Results

Research Context

Intended Use

Full Model Card (README)