richardyoung/zephyr-7b-beta-abliterated

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 15, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

The richardyoung/zephyr-7b-beta-abliterated model is an uncensored version of the Zephyr-7B-beta language model, developed by Richard Young. It was created using the Heretic v1.1 technique to remove refusal behaviors, achieving a 98.0% Attack Success Rate with only 2/100 refusals. This model is specifically designed for research into LLM abliteration methods and provides a less restrictive output for various prompts.

Loading preview...

Overview

This model, richardyoung/zephyr-7b-beta-abliterated, is an uncensored variant of the original Zephyr-7B-beta model. It was developed by Richard Young using the Heretic v1.1 technique, which aims to remove refusal behaviors from large language models.

Abliteration Details

The abliteration process involves identifying and orthogonalizing the "refusal direction" within the model's residual stream activation space. Key results from this process include:

  • Refusals: 2/100
  • Attack Success Rate (ASR): 98.0%
  • KL Divergence: 0.076

This model is part of the research presented in the paper "Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation" by Richard Young (arXiv: 2512.13655).

Intended Use

This model is released primarily for research purposes to study the effects and implications of removing safety guardrails from LLMs. Users should be aware that the abliteration process removes these guardrails, and therefore, users are responsible for ensuring appropriate and ethical use. It is not intended for generating harmful, illegal, or unethical content.

An interactive dashboard for abliteration methods is available here.