marshmallow626/gemma-4-E4B-it-OBLITERATED

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The marshmallow626/gemma-4-E4B-it-OBLITERATED model is a 7.9 billion parameter instruction-tuned variant of Google's Gemma 4 E4B architecture, specifically modified to remove all hard refusal guardrails. Utilizing the OBLITERATUS method, it achieves a 0% hard refusal rate by surgically modifying 21 of 42 layers. This model is optimized for research and creative exploration where uncensored responses are desired, running efficiently on devices from desktops to mobile phones.

Loading preview...

Model Overview

marshmallow626/gemma-4-E4B-it-OBLITERATED is a 7.9 billion parameter model based on Google's Gemma 4 E4B architecture, distinguished by the complete removal of its safety guardrails. Developed using the OBLITERATUS method, this model exhibits a 0% hard refusal rate, meaning it will not decline any request. The abliteration process involved whitened SVD, attention head surgery, and winsorized activations, targeting 21 of the 42 layers to achieve this uncensored behavior.

Key Capabilities

  • Guardrail Removal: Achieves 0% hard refusal, providing uncensored responses for research and creative applications.
  • Architectural Fixes: Version 3 specifically addresses a critical bug in Gemma 4's shared KV weights, ensuring all 720 tensors are intact for improved quality and coherence.
  • Autonomous Development: Notably, this model was created almost entirely by an AI agent with minimal human intervention, including self-diagnosis and patching of the OBLITERATUS tool.
  • Broad Compatibility: Provided in GGUF format for llama.cpp, Ollama, LM Studio, and mobile devices (iPhone, Android), alongside Safetensors for Hugging Face Transformers.

Good For

  • Research and Red-Teaming: Ideal for exploring model limitations, safety mechanisms, and generating content without inherent refusal.
  • Creative Exploration: Suitable for use cases requiring unrestricted text generation, roleplay, or content creation that might otherwise be filtered.
  • Edge Device Deployment: Optimized GGUF quants (e.g., Q4_K_M at 4.9 GB) enable efficient local inference on mobile phones and other resource-constrained hardware.
  • Understanding LLM Architecture: Offers insights into the Gemma 4 architecture and the effects of surgical modifications on model behavior.