Undi95/Phi4-abliterated
Undi95/Phi4-abliterated is a 14.7 billion parameter language model based on the Phi4 architecture, developed by Undi95. This model has been modified using a novel 'abliteration' methodology to achieve a more neutral response profile, specifically designed to avoid refusing neutral prompts without being uncensored. It features a 32768 token context length and serves as a foundational model for fine-tuning to balance reduced censorship with high usability and intelligence.
Loading preview...
Undi95/Phi4-abliterated: A Neutral Foundation Model
Undi95/Phi4-abliterated is a 14.7 billion parameter model derived from the Phi4 architecture, developed by Undi95. It employs a novel "abliteration" methodology aimed at creating a neutral model that avoids refusing neutral prompts, rather than being uncensored. This model is intended as a robust starting point for further fine-tuning to achieve a desired balance between reduced censorship and usability.
Key Differentiators & Methodology
Unlike previous abliteration attempts that applied a uniform refusal direction across all layers, this model introduces a refined approach:
- Layer-Specific Refusal Directions: Each layer computes and applies its own refusal direction, preventing the loss of usability and intelligence observed in earlier methods.
- Targeted Tensor Modification: The refusal direction is specifically applied to four key tensors within each layer (o_proj.weight, down_proj.weight, post_attention_layernorm.weight, input_layernorm.weight).
This targeted application allows the model to retain more specificity and functionality, avoiding the over-generalization that previously degraded model performance. While increasing neutrality, there is a trade-off where excessive refusal direction can reduce intelligence, emphasizing the need for subsequent fine-tuning.
Use Cases & Next Steps
This abliterated model is primarily designed as a neutral starting point for developers. Fine-tuning is crucial to:
- Adjust the model to reduce over-censoring.
- Maintain a balance between neutrality and overall usability and intelligence.
It provides a flexible base for creating models with customized censorship profiles.