failspy/Phi-3-mini-128k-instruct-abliterated-v3

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:May 26, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

failspy/Phi-3-mini-128k-instruct-abliterated-v3 is a 4 billion parameter instruction-tuned causal language model, based on Microsoft's Phi-3-mini-128k-instruct. This model has undergone an "abliteration" process, using orthogonalized bfloat16 safetensor weights to specifically inhibit refusal behaviors. It maintains the original model's capabilities while aiming for an uncensored response style, making it suitable for applications requiring direct answers without ethical lecturing.

Loading preview...

Phi-3-mini-128k-instruct-abliterated-v3 Overview

This model, developed by failspy, is a modified version of Microsoft's Phi-3-mini-128k-instruct. It features 4 billion parameters and has been processed using a refined "abliteration" methodology. This technique involves orthogonalizing specific bfloat16 safetensor weights to inhibit the model's tendency to express refusal, based on research into refusal directions in LLMs.

Key Characteristics & Methodology

  • "Abliterated" for Uncensored Responses: The core differentiator is the manipulation of weights to reduce refusal behaviors, aiming for a more direct and uncensored interaction style without altering other core functionalities.
  • Orthogonalization: This surgical technique modifies specific features (like refusal) with significantly less data than traditional fine-tuning, preserving the original model's knowledge and training.
  • Stability: Despite the modifications, the model is generally as stable as the original Phi-3-mini-128k-instruct, though it may exhibit a slightly higher propensity for hallucination.
  • Experimental Nature: As the methodology is new, users are encouraged to report any "quirks" or unexpected behaviors to help refine the process.

When to Consider This Model

  • Direct, Unfiltered Responses: Ideal for use cases where the primary goal is to receive direct answers without the model lecturing on ethics or safety.
  • Exploration of Ablation Techniques: Developers interested in experimenting with or building upon novel weight manipulation methods for specific behavioral changes.
  • Research into LLM Behavior Modification: Useful for studying the effects of orthogonalization on model outputs and exploring its potential for targeted feature removal or augmentation.