coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 6, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic is a 26 billion parameter instruction-tuned multimodal language model, a decensored version of Google DeepMind's Gemma 4 26B A4B model. Utilizing the Arbitrary-Rank Ablation (ARA) method with Heretic v1.2.0, this model significantly reduces refusals compared to its original counterpart while maintaining strong performance. It features a 256K token context window and excels in reasoning, coding, and multimodal understanding across text and image inputs.

Loading preview...

Model Overview

This model, coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic, is a 26 billion parameter instruction-tuned multimodal language model based on Google DeepMind's Gemma 4 family. It is a decensored variant, created using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method. This process significantly reduces model refusals (13/100 compared to 100/100 for the original) while introducing a minimal KL divergence of 0.0660.

Part of the Gemma 4 series, this model is optimized with Quantization-Aware Training (QAT) for efficient memory usage. It is a Mixture-of-Experts (MoE) architecture, featuring 25.2 billion total parameters but activating only 3.8 billion during inference, allowing for faster execution comparable to a 4B model. It supports a substantial 256K token context window and handles both text and image inputs.

Key Capabilities

  • Decensored Behavior: Significantly reduced refusal rates compared to the base Gemma 4 model.
  • Multimodal Understanding: Processes text and image inputs, with variable aspect ratio and resolution support.
  • Reasoning: Designed with highly capable reasoning abilities, including configurable thinking modes.
  • Long Context: Supports a 256K token context window for complex, long-form tasks.
  • Efficient Architecture: Mixture-of-Experts (MoE) design with 3.8B active parameters for fast inference.
  • Coding & Agentic Capabilities: Improved performance in coding benchmarks and native function-calling support.

Good for

  • Applications requiring a less restrictive, decensored language model.
  • Tasks involving complex reasoning and problem-solving.
  • Multimodal applications that integrate text and image data.
  • Code generation, completion, and agentic workflows.
  • Scenarios where efficient inference with a large parameter count is desired.