llmfan46/gemma-4-31B-it-uncensored-heretic

Hugging Face
VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 3, 2026License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

llmfan46/gemma-4-31B-it-uncensored-heretic is a 31 billion parameter instruction-tuned Gemma 4 model, developed by llmfan46, that has been decensored using the Heretic v1.2.0 Arbitrary-Rank Ablation (ARA) method. It achieves 90% fewer refusals (10/100) compared to the original Google Gemma-4-31B-it model, while preserving model quality with a KL divergence of 0.0541. This model is optimized for use cases requiring less content restriction without significant performance degradation.

Loading preview...

Overview

This model, llmfan46/gemma-4-31B-it-uncensored-heretic, is a 31 billion parameter instruction-tuned variant of Google's Gemma 4 model. It has been specifically modified using the Heretic v1.2.0 Arbitrary-Rank Ablation (ARA) method to significantly reduce content refusals. The modification results in 90% fewer refusals (10/100) compared to the original model (99/100 refusals), while maintaining a low KL divergence of 0.0541, indicating strong preservation of the original model's quality and capabilities.

Key Capabilities & Performance

  • Decensored Output: Achieves a substantial reduction in content refusals, making it suitable for applications requiring less restrictive content generation.
  • Multimodal: Inherits Gemma 4's capabilities for processing text and image inputs, with a context window of up to 256K tokens.
  • Reasoning & Coding: Designed for strong reasoning, agentic workflows, and enhanced coding capabilities, including native function-calling support.
  • MMLU Performance: Maintains a high MMLU accuracy of 85.90%, closely matching the original model's 86.50%.

Good for

  • Developers seeking a powerful 31B parameter model with significantly reduced content moderation for broader application use cases.
  • Applications where the original Gemma 4's refusal rate was a limiting factor.
  • Tasks requiring robust reasoning, coding, and multimodal understanding with a preference for less constrained output.