coder3101/gemma-4-26B-A4B-it-heretic

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 2, 2026License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

The coder3101/gemma-4-26B-A4B-it-heretic is a 26 billion parameter instruction-tuned decensored variant of Google DeepMind's Gemma 4 model, specifically the 26B A4B MoE architecture. This multimodal model processes text and image inputs, generating text outputs, and is optimized for reasoning, coding, and agentic workflows. It features a 32768 token context length and achieves 88.3% on AIME 2026 no tools, making it suitable for applications requiring robust, less restrictive AI interactions.

Loading preview...

Model Overview

This model, coder3101/gemma-4-26B-A4B-it-heretic, is a decensored version of Google DeepMind's Gemma 4 26B A4B instruction-tuned model. It was created using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method, specifically designed to reduce refusals. While the original model had 100/100 refusals, this 'heretic' variant significantly lowers them to 11/100, with a KL divergence of 0.0499 compared to the original.

Key Capabilities

  • Multimodal Processing: Handles both text and image inputs, generating text outputs. The base Gemma 4 models also support video processing.
  • Efficient Architecture: Utilizes a Mixture-of-Experts (MoE) design with 25.2 billion total parameters but only 3.8 billion active parameters, allowing for faster inference comparable to a 4B model.
  • Extended Context Window: Supports a substantial 256K token context length, enabling complex, long-context tasks.
  • Enhanced Reasoning & Coding: Designed for strong reasoning capabilities, agentic workflows, and improved performance in coding benchmarks like LiveCodeBench v6 (77.1%) and Codeforces ELO (1718).
  • Native System Prompt Support: Includes native support for the system role, facilitating more structured and controllable conversations.

Good For

  • Applications requiring a less restrictive, decensored large language model.
  • Tasks involving complex reasoning, code generation, and agentic workflows.
  • Multimodal applications that need to process both text and images.
  • Scenarios where efficient inference is crucial, leveraging the MoE architecture's active parameter count.