failspy/llama-3-70B-Instruct-abliterated

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:May 7, 2024License:llama3Architecture:Transformer0.1K Warm

The failspy/llama-3-70B-Instruct-abliterated model is a 70 billion parameter instruction-tuned language model, derived from Meta's Llama-3-70B-Instruct. This model has undergone specific weight manipulation to orthogonalize the refusal direction, aiming to inhibit the model's tendency to express refusal or lecture on ethics. It maintains the original Llama-3-70B-Instruct tuning in all other aspects, offering an 8192-token context length. Its primary differentiator is the experimental reduction of refusal behaviors, making it suitable for use cases where direct responses are preferred over ethical caveats.

Loading preview...

Model Overview

The failspy/llama-3-70B-Instruct-abliterated is an experimental 70 billion parameter instruction-tuned model based on Meta's Llama-3-70B-Instruct. Its core innovation lies in the application of a methodology described in the paper "Refusal in LLMs is mediated by a single direction". This involves manipulating specific bfloat16 safetensor weights to orthogonalize the 'refusal direction', aiming to reduce the model's propensity to refuse requests or provide ethical lectures.

Key Characteristics

  • Abliterated Refusal: The model has been modified to inhibit refusal behaviors, though it is not guaranteed to eliminate them entirely.
  • Llama-3-70B-Instruct Base: Retains the core capabilities and tuning of the original Llama-3-70B-Instruct model.
  • Experimental Nature: This is a novel application of ablation, and users are encouraged to explore and report any unique quirks or side effects.
  • Tinkering Friendly: The refusal_dir.pth file is included, allowing users to apply the orthogonalization to their own downloaded Llama-3-70B-Instruct models using the provided ortho_cookbook.ipynb.

Use Cases

This model is particularly suited for:

  • Research and Experimentation: Ideal for exploring the effects of refusal direction ablation on LLM behavior.
  • Applications requiring direct responses: Where minimizing ethical caveats or refusals is a priority, understanding the experimental nature.
  • Developers interested in model modification: For those who wish to apply similar methodologies or further develop this approach.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p