sh0ck0r/gemma-4-26B-A4B-it-heretic

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The sh0ck0r/gemma-4-26B-A4B-it-heretic model is a 26 billion parameter instruction-tuned variant of Google DeepMind's Gemma 4 family, specifically a decensored version of google/gemma-4-26B-A4B-it created using the Heretic tool. This multimodal model supports text and image input with a 32768 token context length, featuring a Mixture-of-Experts (MoE) architecture with 3.8B active parameters for efficient inference. It is optimized for reasoning, coding, and agentic workflows, offering enhanced capabilities compared to its original counterpart by significantly reducing refusals.

Loading preview...

sh0ck0r/gemma-4-26B-A4B-it-heretic: Decensored Gemma 4 MoE Model

This model is a 26 billion parameter instruction-tuned variant from Google DeepMind's Gemma 4 family, specifically a decensored version of google/gemma-4-26B-A4B-it created using the Heretic v1.3.0 tool. It features a Mixture-of-Experts (MoE) architecture with 3.8 billion active parameters, allowing for fast inference comparable to a 4B model despite its larger total parameter count. The model supports a substantial context window of 256K tokens and is multimodal, capable of processing both text and image inputs.

Key Differentiators & Capabilities

  • Decensored Performance: Significantly reduces refusals (28/100 compared to 100/100 for the original model), making it more permissive.
  • Multimodal: Handles text and image inputs, with support for variable aspect ratios and resolutions.
  • Efficient MoE Architecture: Achieves high performance with 3.8B active parameters, offering speed benefits.
  • Enhanced Reasoning: Designed with configurable thinking modes for step-by-step reasoning.
  • Coding & Agentic Capabilities: Shows improvements in coding benchmarks and supports native function-calling.
  • Long Context: Features a 256K token context window for complex tasks.

Should you use this for your use case?

This model is ideal if your application requires a powerful, multimodal language model with reduced content restrictions, particularly for tasks involving reasoning, code generation, or agentic workflows. Its MoE architecture makes it suitable for scenarios where fast inference is crucial, and its multimodal capabilities are beneficial for applications requiring image understanding. However, users should be aware of the reduced refusal rates and implement their own safety measures if strict content moderation is required.