failspy/Smaug-Llama-3-70B-Instruct-abliterated-v3

Warm
Public
70B
FP8
8192
License: llama3
Hugging Face
Overview

What the fuck is this model about?

This model, failspy/Smaug-Llama-3-70B-Instruct-abliterated-v3, is a 70 billion parameter instruction-tuned variant of abacusai/Smaug-Llama-3-70B-Instruct. Its core innovation lies in a process termed "abliteration" or orthogonalization, which surgically manipulates specific weights to reduce the model's tendency to refuse user requests. This technique is based on the research presented in 'Refusal in LLMs is mediated by a single direction'. The model aims to provide an "uncensored" experience by removing refusal directions without altering other behaviors or knowledge.

What makes THIS different from all the other models?

Unlike traditional fine-tuning, which broadly changes model behavior, this model uses a more "surgical" ablation methodology. This approach allows for the removal of a very specific undesirable feature (refusal) with significantly less data than fine-tuning, preserving the original model's knowledge and training integrity. It's designed to be a pure form of uncensored output, focusing solely on inhibiting refusal without introducing new or changed behaviors in other respects. The methodology is considered novel and distinct from typical "uncensored" fine-tunes.

Should I use this for my use case?

Good for:

  • Applications requiring less restrictive content generation: If your use case benefits from a model less prone to ethical lecturing or refusal, this model is designed for that.
  • Exploring novel LLM modification techniques: Developers interested in the "abliteration" or orthogonalization methodology for feature removal/augmentation may find this model a valuable testbed.
  • Maintaining original model knowledge: If you need the core capabilities of the base Smaug-Llama-3-70B-Instruct but with reduced refusal, this model retains that knowledge.

Considerations:

  • Potential for quirks: As the methodology is new, the model may exhibit unforeseen quirks or side effects. Users are encouraged to report these.
  • Not a general-purpose behavioral change: This model is specifically modified for refusal inhibition; it does not introduce other new behaviors or capabilities beyond the base model.