Name: llmfan46/gemma-4-26B-A4B-it-qat-q4_0-unquantized-uncensored-heretic API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: llmfan46

Overview

This model, llmfan46/gemma-4-26B-A4B-it-qat-q4_0-unquantized-uncensored-heretic, is a decensored version of Google DeepMind's Gemma 4 26B A4B instruction-tuned model. It leverages a Mixture-of-Experts (MoE) architecture with 25.2 billion total parameters and 3.8 billion active parameters, allowing for efficient inference comparable to a 4B model. The primary differentiator is its significantly reduced refusal rate (11/100 vs. 100/100 for the original), achieved through the Heretic v1.2.0 tool using Arbitrary-Rank Ablation (ARA), while maintaining a low KL divergence of 0.0618, indicating strong preservation of the original model's quality and capabilities. It supports a 256K token context length and multimodal inputs (text, image).

Key Capabilities

Decensored Output: Achieves 89% fewer refusals compared to the original model, providing less restricted content generation.
Multimodal Understanding: Processes text and image inputs, with a vision encoder of ~550M parameters.
Efficient Inference: MoE architecture with 3.8B active parameters enables faster inference despite its 25.2B total parameters.
Long Context: Supports a substantial 256K token context window.
Reasoning: Includes a built-in reasoning mode for step-by-step thought processes.

Good For

Applications requiring less restrictive content generation or creative freedom.
Multimodal tasks involving text and image inputs where censorship is a concern.
Scenarios needing efficient inference on a large parameter model, benefiting from the MoE architecture's active parameter count.
Use cases demanding long context understanding and reasoning capabilities.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)