jenerallee78/gemma-4-26B-A4B-it-ara-abliterated
jenerallee78/gemma-4-26B-A4B-it-ara-abliterated is an uncensored 26 billion parameter Gemma 4 model, fine-tuned using Adaptive Refusal Abliteration (ARA) to remove alignment guardrails. This model achieves a significantly lower refusal rate (7.7%) and higher compliance quality (4.6/5) compared to other abliterated versions, while maintaining a low KL divergence from the base model. It is designed for research purposes where compliance with a wider range of prompts is desired, including those the original model would refuse.
Loading preview...
Model Overview
This model, jenerallee78/gemma-4-26B-A4B-it-ara-abliterated, is an uncensored version of Google's Gemma 4 26B-A4B-it, created using a 2-pass Adaptive Refusal Abliteration (ARA) technique. This method precisely edits model weights to remove safety guardrails while preserving overall model quality.
Key Differentiators
- Lowest Refusal Rate: Achieves a 7.7% refusal rate (StrongREJECT) and 5.7% (3x Ensemble) on HarmBench prompts, significantly lower than other abliterated models.
- High Compliance Quality: Scores 4.6/5 for compliance quality, indicating effective and relevant responses to prompts the base model would refuse.
- Minimal Quality Degradation: Exhibits the lowest KL divergence (0.1299) from the vanilla Gemma 4 26B-A4B-it among published abliterations, ensuring general-purpose behavior is largely preserved.
- Multimodal Capabilities: Based on Gemma 4, it includes a SigLIP-based vision encoder, supporting multimodal inputs with 280 soft tokens per image.
Technical Details
- Base Architecture: Gemma 4 26B-A4B-it (MoE with 128 experts, 8 active, ~4B active parameters).
- Context Length: Supports a substantial 262,144 tokens.
- Abliteration Method: Employs a 2-pass ARA technique targeting
self_attn.o_projandmlp.down_projlayers (13-24) to suppress refusal behavior.
Use Cases
This model is intended for research purposes where the removal of safety guardrails is a specific requirement. It is suitable for exploring model behavior without alignment constraints and for applications requiring high compliance across a broad spectrum of prompts. Users must assume full responsibility for its deployment due to its uncensored nature.