Name: failspy/Phi-3-medium-4k-instruct-abliterated-v3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: failspy

Overview

This model, failspy/Phi-3-medium-4k-instruct-abliterated-v3, is a modified version of Microsoft's Phi-3-medium-4k-instruct. It utilizes orthogonalized bfloat16 safetensor weights to specifically inhibit the model's tendency to refuse requests or lecture on ethics/safety. This technique, referred to as "abliteration," aims to remove refusal features while preserving the original model's knowledge and training.

Key Capabilities & Methodology

Refusal Inhibition: The primary modification is the manipulation of specific weights to reduce refusal behaviors, based on the methodology described in the paper 'Refusal in LLMs is mediated by a single direction'.
Preservation of Original Behavior: Aside from the reduced refusal, the model is intended to behave identically to the original Phi-3-medium-4k-instruct, maintaining its core capabilities and knowledge.
Surgical Modification: The orthogonalization technique is presented as a more surgical approach than traditional fine-tuning for inducing or removing very specific features with less data.
Uncensored Nature: It aims to provide an "uncensored" experience without introducing new or changed behaviors beyond the removal of refusal.

Use Cases & Considerations

Direct Response Applications: Ideal for scenarios where direct answers are preferred without ethical caveats or refusals.
Exploration of Ablation: Serves as an example of applying ablation/orthogonalization techniques to modify specific model behaviors.
Potential for Quirks: As the methodology is new, users are encouraged to report any unexpected behaviors or "quirks" observed during use.
Further Refinement: The creator encourages community experimentation, such as stacking this abliterated model with fine-tuning, to explore its full potential.

Overview

Overview

Key Capabilities & Methodology

Use Cases & Considerations

Full Model Card (README)