llmfan46/gemma-4-26B-A4B-it-uncensored-heretic

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 7, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

llmfan46/gemma-4-26B-A4B-it-uncensored-heretic is a 26 billion parameter instruction-tuned causal language model, a decensored version of Google DeepMind's Gemma-4-26B-A4B-it. It achieves 89% fewer refusals (11/100) compared to the original (100/100) while largely preserving model quality with a 0.0468 KL divergence. This model is optimized for reduced content restrictions, making it suitable for use cases requiring less censorship while maintaining strong reasoning and multimodal capabilities with a 256K token context window.

Loading preview...

Overview

This model, llmfan46/gemma-4-26B-A4B-it-uncensored-heretic, is a 26 billion parameter instruction-tuned variant of Google DeepMind's Gemma-4-26B-A4B-it. It has been decensored using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method, specifically targeting the attn.o_proj components.

Key Differentiators

  • Reduced Refusals: Achieves 89% fewer refusals (11/100) compared to the original model (100/100), making it significantly less censored.
  • Preserved Quality: Maintains a low KL divergence of 0.0468, indicating that its core capabilities and knowledge are largely preserved from the original Gemma-4 model.
  • Multimodal Capabilities: Inherits the Gemma 4 family's ability to process text, image, and video inputs, with a 256K token context window.
  • Reasoning: Designed with strong reasoning capabilities, including a configurable thinking mode for step-by-step processing.

Performance

While significantly reducing refusals, the model shows minor changes in benchmark scores:

  • PIQA Accuracy: 91.73% (vs. 92.06% for original).
  • MMLU Accuracy: 80.81% (vs. 82.48% for original).

Good For

  • Applications requiring a powerful, multimodal LLM with significantly reduced content restrictions.
  • Tasks benefiting from strong reasoning, coding, and agentic workflows.
  • Use cases involving long context understanding (up to 256K tokens) and interleaved multimodal inputs (text, image, video).