coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic
The coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic is a 26 billion parameter instruction-tuned multimodal language model, a decensored version of Google DeepMind's Gemma 4 26B A4B model. Utilizing the Arbitrary-Rank Ablation (ARA) method with Heretic v1.2.0, this model significantly reduces refusals compared to its original counterpart while maintaining strong performance. It features a 256K token context window and excels in reasoning, coding, and multimodal understanding across text and image inputs.
Loading preview...
Model Overview
This model, coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic, is a 26 billion parameter instruction-tuned multimodal language model based on Google DeepMind's Gemma 4 family. It is a decensored variant, created using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method. This process significantly reduces model refusals (13/100 compared to 100/100 for the original) while introducing a minimal KL divergence of 0.0660.
Part of the Gemma 4 series, this model is optimized with Quantization-Aware Training (QAT) for efficient memory usage. It is a Mixture-of-Experts (MoE) architecture, featuring 25.2 billion total parameters but activating only 3.8 billion during inference, allowing for faster execution comparable to a 4B model. It supports a substantial 256K token context window and handles both text and image inputs.
Key Capabilities
- Decensored Behavior: Significantly reduced refusal rates compared to the base Gemma 4 model.
- Multimodal Understanding: Processes text and image inputs, with variable aspect ratio and resolution support.
- Reasoning: Designed with highly capable reasoning abilities, including configurable thinking modes.
- Long Context: Supports a 256K token context window for complex, long-form tasks.
- Efficient Architecture: Mixture-of-Experts (MoE) design with 3.8B active parameters for fast inference.
- Coding & Agentic Capabilities: Improved performance in coding benchmarks and native function-calling support.
Good for
- Applications requiring a less restrictive, decensored language model.
- Tasks involving complex reasoning and problem-solving.
- Multimodal applications that integrate text and image data.
- Code generation, completion, and agentic workflows.
- Scenarios where efficient inference with a large parameter count is desired.