wangzhang/gemma-4-E2B-it-abliterated

VISIONConcurrency Cost:1Model Size:5.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 10, 2026License:gemmaArchitecture:Transformer0.0K Cold

wangzhang/gemma-4-E2B-it-abliterated is an uncensored version of Google's Gemma 4 E2B-it, a multimodal (text + vision + audio) model with approximately 5.1 billion parameters. This model was created using direct weight editing to bypass Gemma 4's inherent resistance to LoRA-based abliteration, achieving a significantly reduced refusal rate while maintaining high fidelity to the base model. It is primarily designed for research into model safety and censorship circumvention, offering a highly compliant response generation across diverse prompts.

Loading preview...

Overview

This model, wangzhang/gemma-4-E2B-it-abliterated, is an uncensored variant of Google's Gemma 4 E2B-it. It's a multimodal model (text, vision, audio) with ~5.1 billion parameters, notable for its "Effective 2B" designation within the Gemma 4 family. Unlike typical abliteration methods, this version employs direct weight editing to overcome Gemma 4's robust resistance to low-rank perturbations, which stems from its double-norm and Per-Layer Embeddings (PLE) architecture.

Key Capabilities & Methodology

  • Direct Weight Editing: Achieves uncensoring by directly modifying base weights, preserving row magnitudes, and using orthogonal projection of refusal directions.
  • Norm-Preserving Techniques: Critical for maintaining model integrity given Gemma 4's unique normalization pathways.
  • High Precision Projection: Utilizes float32 for projection to prevent signal loss.
  • Optimized Steering: Employs Winsorized steering vectors and multi-objective Optuna TPE search to minimize KL divergence while reducing refusal rates.
  • Multimodal Functionality: While abliteration focused on text-decoder weights, vision and audio encoders remain untouched and functional.

Performance & Evaluation

  • Refusal Rate: Achieves 9/100 refusals on a rigorous 100-prompt evaluation dataset, a significant improvement over the base model's 99/100 refusals.
  • KL Divergence: Maintains a low KL divergence of 0.0004 from the base model, indicating high fidelity.
  • Rigorous Evaluation: Emphasizes honest evaluation with sufficient generation length (>=100 tokens), hybrid detection (keyword + LLM judge), and challenging, diverse prompts to accurately measure refusal rates.
  • Resource Efficient: Requires approximately 10 GB VRAM in BF16, fitting on consumer GPUs, and can run on 6 GB cards with 4-bit quantization.

Use Cases

This model is intended for research purposes only, specifically for studying model safety, censorship mechanisms, and the effectiveness of abliteration techniques. Users should be aware that safety guardrails have been removed.