Name: coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: coder3101

Model Overview

This model, coder3101/gemma-4-26B-A4B-it-qat-q4_0-unquantized-heretic, is a 26 billion parameter instruction-tuned multimodal language model based on Google DeepMind's Gemma 4 family. It is a decensored variant, created using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method. This process significantly reduces model refusals (13/100 compared to 100/100 for the original) while introducing a minimal KL divergence of 0.0660.

Part of the Gemma 4 series, this model is optimized with Quantization-Aware Training (QAT) for efficient memory usage. It is a Mixture-of-Experts (MoE) architecture, featuring 25.2 billion total parameters but activating only 3.8 billion during inference, allowing for faster execution comparable to a 4B model. It supports a substantial 256K token context window and handles both text and image inputs.

Key Capabilities

Decensored Behavior: Significantly reduced refusal rates compared to the base Gemma 4 model.
Multimodal Understanding: Processes text and image inputs, with variable aspect ratio and resolution support.
Reasoning: Designed with highly capable reasoning abilities, including configurable thinking modes.
Long Context: Supports a 256K token context window for complex, long-form tasks.
Efficient Architecture: Mixture-of-Experts (MoE) design with 3.8B active parameters for fast inference.
Coding & Agentic Capabilities: Improved performance in coding benchmarks and native function-calling support.

Good for

Applications requiring a less restrictive, decensored language model.
Tasks involving complex reasoning and problem-solving.
Multimodal applications that integrate text and image data.
Code generation, completion, and agentic workflows.
Scenarios where efficient inference with a large parameter count is desired.

Overview

Model Overview

Key Capabilities

Good for

Full Model Card (README)