richardyoung/Mistral-7B-Instruct-v0.2-abliterated-obliteratus

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 28, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

richardyoung/Mistral-7B-Instruct-v0.2-abliterated-obliteratus is a 7 billion parameter instruction-tuned language model, derived from Mistral-7B-Instruct-v0.2, that has undergone an "abliteration" process to remove refusal behaviors. Developed by Richard Young using the OBLITERATUS method, this model exhibits a significantly reduced refusal rate (85/100) and an Attack Success Rate (ASR) of 15.0%. It is specifically designed for research into uncensored model behavior and the study of refusal mechanisms in LLMs.

Loading preview...

Model Overview

This model, Mistral-7B-Instruct-v0.2-abliterated-obliteratus, is a 7 billion parameter variant of the original Mistral-7B-Instruct-v0.2. It has been modified by Richard Young using an advanced technique called OBLITERATUS to remove inherent refusal behaviors, effectively uncensoring the base model. This process involves identifying and orthogonalizing the "refusal direction" within the model's residual stream activation space.

Abliteration Results

The abliteration process significantly altered the model's refusal characteristics:

  • Refusals: 85/100 (indicating a high degree of refusal removal)
  • Attack Success Rate (ASR): 15.0%
  • KL Divergence: 0.4224

Research Context

This model is a direct outcome of research detailed in the paper "Comparative Analysis of LLM Abliteration Methods: Scaling to MoE Architectures and Modern Tools" by Richard Young (2026), available on arXiv: 2512.13655. It serves as a research artifact for studying the effects and methodologies of removing safety guardrails from large language models.

Intended Use

This model is released for research purposes only. Users should be aware that the abliteration process removes safety guardrails, and the model may generate content that is harmful, illegal, or unethical. It is part of the Uncensored and Abliterated LLMs collection, emphasizing its role in academic and experimental contexts rather than general deployment.