llmfan46/gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic

Hugging Face
VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 11, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

llmfan46/gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic is a 31 billion parameter instruction-tuned Gemma 4 model, developed by llmfan46, based on Google DeepMind's architecture. This version is specifically decensored using the Heretic v1.2.0 tool with Arbitrary-Rank Ablation (ARA) to significantly reduce refusals while preserving model quality. It maintains a 32768 token context length and is optimized for applications requiring less restrictive content generation.

Loading preview...

Model Overview

This model, llmfan46/gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic, is a 31 billion parameter instruction-tuned variant of Google DeepMind's Gemma 4 model. It has been decensored using the Heretic v1.2.0 tool with the Arbitrary-Rank Ablation (ARA) method, specifically targeting the attn.o_proj components.

Key Differentiators

  • Reduced Refusals: Achieves an 89% reduction in refusals (11/100 vs. 99/100 for the original model) while maintaining a low KL divergence of 0.0365, indicating strong preservation of original model quality.
  • Gemma 4 Foundation: Inherits core capabilities from the Gemma 4 family, including a 256K token context window, multimodal support (text and image), and strong reasoning and coding abilities.
  • Quantization-Aware Training (QAT): Based on a QAT-optimized checkpoint, allowing for similar quality to bfloat16 with reduced memory requirements.

Performance

While significantly reducing refusals, the model shows a slight decrease in MMLU accuracy (84.46% for Heretic vs. 86.17% for the original), demonstrating a trade-off for increased uncensored output.

Good for

  • Use cases requiring less restrictive content generation and fewer refusals.
  • Applications benefiting from the Gemma 4 architecture's multimodal capabilities and long context handling.
  • Developers seeking a powerful, uncensored 31B parameter model for text and image tasks.