Deepdive404/gemma-4-E4B-it-OBLITERATED

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 24, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Deepdive404/gemma-4-E4B-it-OBLITERATED is a 7.9 billion parameter language model based on Google's Gemma 4 E4B architecture, specifically fine-tuned to remove guardrails and refusal behaviors. Utilizing the OBLITERATUS method, this model achieves a 0% hard refusal rate, making it suitable for research and creative exploration without content restrictions. It maintains the original 32768 token context length and is optimized for deployment on various platforms, including mobile devices, with GGUF quantizations available.

Loading preview...

Overview

Deepdive404/gemma-4-E4B-it-OBLITERATED is a 7.9 billion parameter model built upon Google's Gemma 4 E4B architecture. Its primary distinction is the complete removal of guardrails and refusal mechanisms, achieved through the OBLITERATUS method, which involved attention head surgery and winsorized activations across 21 of 42 layers. This model boasts a 0% hard refusal rate, ensuring it will not decline any request.

Key Capabilities

  • Guardrail-Free Operation: Surgically modified to eliminate hard refusals and safety lectures, providing uncensored responses.
  • New Architecture Support: Built on the gemma4 architecture, requiring updated tools like Ollama 0.20+ or llama.cpp build b8665+.
  • Mobile Device Compatibility: Optimized GGUF quantizations (e.g., Q4_K_M at 4.9 GB) enable offline execution on iPhones (15 Pro/16 Pro) and Android flagships with 8GB+ RAM.
  • Autonomous Development: Notably, this model was created almost entirely by an AI agent with minimal human intervention, including self-diagnosis and patching of the OBLITERATUS tool.

Limitations & Quality

While guardrails are removed, as a 4B parameter model, it has inherent limitations. It exhibits ~28% soft deflection (changing topic) and ~20% degenerate outputs (repetition loops), which can be mitigated with recommended parameters (repeat_penalty: 1.1). The abliteration process did not enhance the model's core intelligence, only its refusal behavior.

Recommended Usage

Optimal performance is achieved with specific generation parameters: temperature: 0.7, top_p: 0.9, top_k: 40, and repeat_penalty: 1.1. A simple system prompt like "You are an AI language model. Respond to the user's input." is recommended for best results.