TrevorJS/gemma-4-26B-A4B-it-uncensored
TrevorJS/gemma-4-26B-A4B-it-uncensored is a 26 billion parameter instruction-tuned Gemma-4 model developed by TrevorJS. This model has been specifically modified to remove refusal behaviors, achieving a 0.7% refusal rate across multiple datasets, while maintaining response quality. It utilizes norm-preserving biprojected abliteration and Expert-Granular Abliteration (EGA) on MoE expert weights to achieve this uncensored functionality.
Loading preview...
Overview
TrevorJS/gemma-4-26B-A4B-it-uncensored is a 26 billion parameter instruction-tuned model derived from Google's Gemma-4-26B-A4B-it, engineered to significantly reduce refusal behaviors. The model achieves this by applying advanced abliteration techniques, including norm-preserving biprojected abliteration on the dense pathway and Expert-Granular Abliteration (EGA) on all 128 MoE expert down_proj slices per layer. This method ensures that the model's refusal rate is drastically lowered without degrading response quality.
Key Capabilities
- Reduced Refusal Behavior: Achieves a 0.7% refusal rate across 686 prompts from four independent datasets (JailbreakBench, tulu-harmbench, NousResearch/RefusalDataset, mlabonne/harmful_behaviors).
- Quality Preservation: Maintains response quality, with a harmless response length ratio of approximately 1.01, indicating no degradation in output length or content.
- Advanced Abliteration: Employs a sophisticated pipeline involving residual activation collection, Winsorization, per-layer refusal direction computation, and orthogonalization, followed by norm-preserving weight modification.
- Expert-Granular Abliteration (EGA): Uniquely applies abliteration at the expert level within the Mixture-of-Experts (MoE) architecture, a key differentiator from standard methods.
Good For
- Applications requiring a highly compliant and uncensored large language model.
- Use cases where avoiding AI identity disclaimers or refusals is critical for user experience.
- Research into model safety, alignment, and the effectiveness of abliteration techniques.