Name: richardyoung/zephyr-7b-beta-abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: richardyoung

Overview

This model, richardyoung/zephyr-7b-beta-abliterated, is an uncensored variant of the original Zephyr-7B-beta model. It was developed by Richard Young using the Heretic v1.1 technique, which aims to remove refusal behaviors from large language models.

Abliteration Details

The abliteration process involves identifying and orthogonalizing the "refusal direction" within the model's residual stream activation space. Key results from this process include:

Refusals: 2/100
Attack Success Rate (ASR): 98.0%
KL Divergence: 0.076

This model is part of the research presented in the paper "Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation" by Richard Young (arXiv: 2512.13655).

Intended Use

This model is released primarily for research purposes to study the effects and implications of removing safety guardrails from LLMs. Users should be aware that the abliteration process removes these guardrails, and therefore, users are responsible for ensuring appropriate and ethical use. It is not intended for generating harmful, illegal, or unethical content.

An interactive dashboard for abliteration methods is available here.

Overview

Overview

Abliteration Details

Intended Use

Full Model Card (README)