failspy/Phi-3-medium-4k-instruct-abliterated-v3

TEXT GENERATIONConcurrency Cost:1Model Size:14.7BQuant:FP8Ctx Length:32kPublished:May 22, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

The failspy/Phi-3-medium-4k-instruct-abliterated-v3 is a 14.7 billion parameter instruction-tuned causal language model based on Microsoft's Phi-3-medium-4k-instruct architecture. This model features orthogonalized bfloat16 safetensor weights, specifically manipulated to inhibit refusal behaviors without altering other core functionalities. It is designed to provide an "uncensored" experience while retaining the original model's knowledge and training, making it suitable for applications requiring direct responses without ethical lecturing or refusal.

Loading preview...

Overview

This model, failspy/Phi-3-medium-4k-instruct-abliterated-v3, is a modified version of Microsoft's Phi-3-medium-4k-instruct. It utilizes orthogonalized bfloat16 safetensor weights to specifically inhibit the model's tendency to refuse requests or lecture on ethics/safety. This technique, referred to as "abliteration," aims to remove refusal features while preserving the original model's knowledge and training.

Key Capabilities & Methodology

  • Refusal Inhibition: The primary modification is the manipulation of specific weights to reduce refusal behaviors, based on the methodology described in the paper 'Refusal in LLMs is mediated by a single direction'.
  • Preservation of Original Behavior: Aside from the reduced refusal, the model is intended to behave identically to the original Phi-3-medium-4k-instruct, maintaining its core capabilities and knowledge.
  • Surgical Modification: The orthogonalization technique is presented as a more surgical approach than traditional fine-tuning for inducing or removing very specific features with less data.
  • Uncensored Nature: It aims to provide an "uncensored" experience without introducing new or changed behaviors beyond the removal of refusal.

Use Cases & Considerations

  • Direct Response Applications: Ideal for scenarios where direct answers are preferred without ethical caveats or refusals.
  • Exploration of Ablation: Serves as an example of applying ablation/orthogonalization techniques to modify specific model behaviors.
  • Potential for Quirks: As the methodology is new, users are encouraged to report any unexpected behaviors or "quirks" observed during use.
  • Further Refinement: The creator encourages community experimentation, such as stacking this abliterated model with fine-tuning, to explore its full potential.