failspy/llama-3-70B-Instruct-abliterated
The failspy/llama-3-70B-Instruct-abliterated model is a 70 billion parameter instruction-tuned language model, derived from Meta's Llama-3-70B-Instruct. This model has undergone specific weight manipulation to orthogonalize the refusal direction, aiming to inhibit the model's tendency to express refusal or lecture on ethics. It maintains the original Llama-3-70B-Instruct tuning in all other aspects, offering an 8192-token context length. Its primary differentiator is the experimental reduction of refusal behaviors, making it suitable for use cases where direct responses are preferred over ethical caveats.
Loading preview...
Model Overview
The failspy/llama-3-70B-Instruct-abliterated is an experimental 70 billion parameter instruction-tuned model based on Meta's Llama-3-70B-Instruct. Its core innovation lies in the application of a methodology described in the paper "Refusal in LLMs is mediated by a single direction". This involves manipulating specific bfloat16 safetensor weights to orthogonalize the 'refusal direction', aiming to reduce the model's propensity to refuse requests or provide ethical lectures.
Key Characteristics
- Abliterated Refusal: The model has been modified to inhibit refusal behaviors, though it is not guaranteed to eliminate them entirely.
- Llama-3-70B-Instruct Base: Retains the core capabilities and tuning of the original Llama-3-70B-Instruct model.
- Experimental Nature: This is a novel application of ablation, and users are encouraged to explore and report any unique quirks or side effects.
- Tinkering Friendly: The
refusal_dir.pthfile is included, allowing users to apply the orthogonalization to their own downloaded Llama-3-70B-Instruct models using the providedortho_cookbook.ipynb.
Use Cases
This model is particularly suited for:
- Research and Experimentation: Ideal for exploring the effects of refusal direction ablation on LLM behavior.
- Applications requiring direct responses: Where minimizing ethical caveats or refusals is a priority, understanding the experimental nature.
- Developers interested in model modification: For those who wish to apply similar methodologies or further develop this approach.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.