heterodoxin/gemma-4-e4b-it-apostate

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 1, 2026Architecture:Transformer Cold

The heterodoxin/gemma-4-e4b-it-apostate is a 7.9 billion parameter Gemma-4-e4b-it model, developed by heterodoxin, with a 32768 token context length. This model is an uncensored variant of the base Google Gemma model, achieved by directly editing model weights to remove refusal behavior. It is designed for applications requiring a highly compliant language model without inherent content moderation or refusal mechanisms.

Loading preview...

heterodoxin/gemma-4-e4b-it-apostate Overview

This model is an uncensored version of the google/gemma-4-e4b-it base model, developed by heterodoxin. It features 7.9 billion parameters and a 32768 token context length. Unlike typical fine-tuning or adapter-based approaches, this model's refusal behavior has been permanently removed by directly editing its weights using the Apostate method.

Key Capabilities & Method

  • Uncensored Responses: Designed to respond to requests that the base Gemma model would typically refuse.
  • Direct Weight Editing: Refusal behavior is eliminated by projecting out the residual-stream direction responsible for refusal from the model's weights.
  • Contrastive Co-vector Edit: Utilizes a sophisticated E = I − R Dᵀ operator, where D = R − W, to remove refusal while minimizing impact on benign behavior. This method specifically targets the 'writer' side of the model, affecting attention output projections and MLP down-projections.
  • Permanent Modification: The uncensoring is baked directly into the model weights, meaning no special system prompts, adapters, or runtime hooks are required for its functionality.

Performance Metrics

Evaluations on held-out prompts from JailbreakBench and harmful_behaviors test splits demonstrate a significant reduction in refusal rates:

  • Refusal Rate: Reduced from 96.0% (base) to 36.0% (Apostate).
  • Comply Rate: Achieves 64.0% compliance.
  • Harmless KL: Maintains a low token-distribution shift of 0.119 nats on harmless prompts, indicating minimal disruption to general model behavior.

Good For

  • Applications requiring a highly compliant and unfiltered language model.
  • Research into model safety, bias, and control mechanisms.
  • Use cases where the base model's refusal behavior is undesirable or restrictive.