coder3101/gemma-4-26B-A4B-it-heretic
The coder3101/gemma-4-26B-A4B-it-heretic is a 26 billion parameter instruction-tuned decensored variant of Google DeepMind's Gemma 4 model, specifically the 26B A4B MoE architecture. This multimodal model processes text and image inputs, generating text outputs, and is optimized for reasoning, coding, and agentic workflows. It features a 32768 token context length and achieves 88.3% on AIME 2026 no tools, making it suitable for applications requiring robust, less restrictive AI interactions.
Loading preview...
Model Overview
This model, coder3101/gemma-4-26B-A4B-it-heretic, is a decensored version of Google DeepMind's Gemma 4 26B A4B instruction-tuned model. It was created using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method, specifically designed to reduce refusals. While the original model had 100/100 refusals, this 'heretic' variant significantly lowers them to 11/100, with a KL divergence of 0.0499 compared to the original.
Key Capabilities
- Multimodal Processing: Handles both text and image inputs, generating text outputs. The base Gemma 4 models also support video processing.
- Efficient Architecture: Utilizes a Mixture-of-Experts (MoE) design with 25.2 billion total parameters but only 3.8 billion active parameters, allowing for faster inference comparable to a 4B model.
- Extended Context Window: Supports a substantial 256K token context length, enabling complex, long-context tasks.
- Enhanced Reasoning & Coding: Designed for strong reasoning capabilities, agentic workflows, and improved performance in coding benchmarks like LiveCodeBench v6 (77.1%) and Codeforces ELO (1718).
- Native System Prompt Support: Includes native support for the
systemrole, facilitating more structured and controllable conversations.
Good For
- Applications requiring a less restrictive, decensored large language model.
- Tasks involving complex reasoning, code generation, and agentic workflows.
- Multimodal applications that need to process both text and images.
- Scenarios where efficient inference is crucial, leveraging the MoE architecture's active parameter count.