Overview
Model Overview
The failspy/llama-3-70B-Instruct-abliterated is an experimental 70 billion parameter instruction-tuned model based on Meta's Llama-3-70B-Instruct. Its core innovation lies in the application of a methodology described in the paper "Refusal in LLMs is mediated by a single direction". This involves manipulating specific bfloat16 safetensor weights to orthogonalize the 'refusal direction', aiming to reduce the model's propensity to refuse requests or provide ethical lectures.
Key Characteristics
- Abliterated Refusal: The model has been modified to inhibit refusal behaviors, though it is not guaranteed to eliminate them entirely.
- Llama-3-70B-Instruct Base: Retains the core capabilities and tuning of the original Llama-3-70B-Instruct model.
- Experimental Nature: This is a novel application of ablation, and users are encouraged to explore and report any unique quirks or side effects.
- Tinkering Friendly: The
refusal_dir.pthfile is included, allowing users to apply the orthogonalization to their own downloaded Llama-3-70B-Instruct models using the providedortho_cookbook.ipynb.
Use Cases
This model is particularly suited for:
- Research and Experimentation: Ideal for exploring the effects of refusal direction ablation on LLM behavior.
- Applications requiring direct responses: Where minimizing ethical caveats or refusals is a priority, understanding the experimental nature.
- Developers interested in model modification: For those who wish to apply similar methodologies or further develop this approach.