Overview
Phi-3-mini-128k-instruct-abliterated-v3 Overview
This model, developed by failspy, is a modified version of Microsoft's Phi-3-mini-128k-instruct. It features 4 billion parameters and has been processed using a refined "abliteration" methodology. This technique involves orthogonalizing specific bfloat16 safetensor weights to inhibit the model's tendency to express refusal, based on research into refusal directions in LLMs.
Key Characteristics & Methodology
- "Abliterated" for Uncensored Responses: The core differentiator is the manipulation of weights to reduce refusal behaviors, aiming for a more direct and uncensored interaction style without altering other core functionalities.
- Orthogonalization: This surgical technique modifies specific features (like refusal) with significantly less data than traditional fine-tuning, preserving the original model's knowledge and training.
- Stability: Despite the modifications, the model is generally as stable as the original Phi-3-mini-128k-instruct, though it may exhibit a slightly higher propensity for hallucination.
- Experimental Nature: As the methodology is new, users are encouraged to report any "quirks" or unexpected behaviors to help refine the process.
When to Consider This Model
- Direct, Unfiltered Responses: Ideal for use cases where the primary goal is to receive direct answers without the model lecturing on ethics or safety.
- Exploration of Ablation Techniques: Developers interested in experimenting with or building upon novel weight manipulation methods for specific behavioral changes.
- Research into LLM Behavior Modification: Useful for studying the effects of orthogonalization on model outputs and exploring its potential for targeted feature removal or augmentation.