failspy/Llama-3-70B-Instruct-abliterated-v3

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:May 19, 2024License:llama3Architecture:Transformer0.0K Warm

failspy/Llama-3-70B-Instruct-abliterated-v3 is a 70 billion parameter instruction-tuned causal language model based on Meta's Llama-3 architecture. Developed by failspy, this model utilizes an orthogonalization methodology to specifically inhibit refusal behaviors, making it "uncensored" without altering other core functionalities. It maintains the original Llama-3-70B-Instruct's capabilities while offering a more direct response style, suitable for applications requiring unfiltered output.

Loading preview...

Overview

failspy/Llama-3-70B-Instruct-abliterated-v3 is a 70 billion parameter instruction-tuned model derived from meta-llama/Meta-Llama-3-70B-Instruct. Its core innovation lies in the application of an "abliteration" technique, specifically orthogonalization of bfloat16 safetensor weights, to inhibit the model's tendency to refuse requests. This process aims to create an "uncensored" model that retains all other behaviors and knowledge of the original Llama-3-70B-Instruct, without introducing new or changed functionalities beyond the removal of refusal.

Key Capabilities

  • Refusal Inhibition: The primary feature is the surgical removal of refusal behaviors, allowing for more direct responses to user prompts.
  • Preservation of Original Model Qualities: The methodology is designed to keep the original Llama-3-70B-Instruct's knowledge and training intact, minimizing side effects.
  • Efficient Feature Modification: Ablation offers a more surgical and data-efficient approach to modifying specific model features compared to extensive fine-tuning.

Methodology Insights

This model's development is based on the concept that refusal in LLMs is mediated by a single direction, as explored in the paper 'Refusal in LLMs is mediated by a single direction'. The orthogonalization technique is presented as a precise method for inducing or removing very specific features, potentially reducing the need for extensive system prompt engineering. It is highlighted as a complementary or alternative approach to fine-tuning, especially for targeted behavioral changes.

Good for

  • Use cases requiring unfiltered or direct responses from an LLM.
  • Developers interested in exploring models with specific behavioral modifications achieved through surgical weight manipulation rather than broad fine-tuning.
  • Research into model interpretability and targeted feature control.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p