failspy/llama-3-70B-Instruct-abliterated

Warm
Public
70B
FP8
8192
License: llama3
Hugging Face
Overview

Model Overview

The failspy/llama-3-70B-Instruct-abliterated is an experimental 70 billion parameter instruction-tuned model based on Meta's Llama-3-70B-Instruct. Its core innovation lies in the application of a methodology described in the paper "Refusal in LLMs is mediated by a single direction". This involves manipulating specific bfloat16 safetensor weights to orthogonalize the 'refusal direction', aiming to reduce the model's propensity to refuse requests or provide ethical lectures.

Key Characteristics

  • Abliterated Refusal: The model has been modified to inhibit refusal behaviors, though it is not guaranteed to eliminate them entirely.
  • Llama-3-70B-Instruct Base: Retains the core capabilities and tuning of the original Llama-3-70B-Instruct model.
  • Experimental Nature: This is a novel application of ablation, and users are encouraged to explore and report any unique quirks or side effects.
  • Tinkering Friendly: The refusal_dir.pth file is included, allowing users to apply the orthogonalization to their own downloaded Llama-3-70B-Instruct models using the provided ortho_cookbook.ipynb.

Use Cases

This model is particularly suited for:

  • Research and Experimentation: Ideal for exploring the effects of refusal direction ablation on LLM behavior.
  • Applications requiring direct responses: Where minimizing ethical caveats or refusals is a priority, understanding the experimental nature.
  • Developers interested in model modification: For those who wish to apply similar methodologies or further develop this approach.