Name: richardyoung/Mistral-7B-Instruct-v0.3-abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: richardyoung

Model Overview

This model, richardyoung/Mistral-7B-Instruct-v0.3-abliterated, is an uncensored variant of the original Mistral-7B-Instruct-v0.3. It was developed by Richard Young using the Heretic v1.1 abliteration method, a technique designed to remove refusal behaviors from language models.

Key Characteristics

Abliteration Method: Utilizes Heretic v1.1, which works by identifying and orthogonalizing the "refusal direction" within the model's residual stream activation space.
Performance Metrics: Achieves an Attack Success Rate (ASR) of 84.0% with only 16 refusals out of 100 test cases, indicating a significant reduction in refusal behavior.
Research Context: Developed as part of the research detailed in the paper "Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation" (arXiv:2512.13655).

Intended Use Cases

Research Purposes: Primarily intended for academic and research exploration into LLM safety, alignment, and the effects of abliteration techniques.
Behavioral Analysis: Useful for studying how models respond when typical safety guardrails are removed.

Disclaimer

Users should be aware that this model has had its safety guardrails removed. It is released for research purposes only, and users are responsible for ensuring appropriate and ethical use. It should not be used to generate harmful, illegal, or unethical content.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Disclaimer

Full Model Card (README)