llmfan46/gemma-4-26B-A4B-it-qat-q4_0-unquantized-uncensored-heretic

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 11, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The llmfan46/gemma-4-26B-A4B-it-qat-q4_0-unquantized-uncensored-heretic is a 26 billion parameter Mixture-of-Experts (MoE) model, derived from Google DeepMind's Gemma 4 family, specifically the 26B A4B variant. This model has been decensored using the Heretic v1.2.0 tool with Arbitrary-Rank Ablation (ARA) to significantly reduce content refusals while preserving original model quality. It is designed for multimodal tasks, handling text and image inputs, and excels in scenarios requiring less restrictive content generation.

Loading preview...

Overview

This model, llmfan46/gemma-4-26B-A4B-it-qat-q4_0-unquantized-uncensored-heretic, is a decensored version of Google DeepMind's Gemma 4 26B A4B instruction-tuned model. It leverages a Mixture-of-Experts (MoE) architecture with 25.2 billion total parameters and 3.8 billion active parameters, allowing for efficient inference comparable to a 4B model. The primary differentiator is its significantly reduced refusal rate (11/100 vs. 100/100 for the original), achieved through the Heretic v1.2.0 tool using Arbitrary-Rank Ablation (ARA), while maintaining a low KL divergence of 0.0618, indicating strong preservation of the original model's quality and capabilities. It supports a 256K token context length and multimodal inputs (text, image).

Key Capabilities

  • Decensored Output: Achieves 89% fewer refusals compared to the original model, providing less restricted content generation.
  • Multimodal Understanding: Processes text and image inputs, with a vision encoder of ~550M parameters.
  • Efficient Inference: MoE architecture with 3.8B active parameters enables faster inference despite its 25.2B total parameters.
  • Long Context: Supports a substantial 256K token context window.
  • Reasoning: Includes a built-in reasoning mode for step-by-step thought processes.

Good For

  • Applications requiring less restrictive content generation or creative freedom.
  • Multimodal tasks involving text and image inputs where censorship is a concern.
  • Scenarios needing efficient inference on a large parameter model, benefiting from the MoE architecture's active parameter count.
  • Use cases demanding long context understanding and reasoning capabilities.