DogOnKeyboard/gemma-3-27b-it-heretic

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Nov 21, 2025License:gemmaArchitecture:Transformer Cold

DogOnKeyboard/gemma-3-27b-it-heretic is a 27 billion parameter instruction-tuned Gemma 3 model, decensored using the Heretic v1.0.1 tool. Developed by Google DeepMind, the base Gemma 3 models are multimodal, handling text and image inputs with a 128K context window and multilingual support. This specific variant is notable for its significantly reduced refusal rate compared to the original model, making it suitable for applications requiring less restrictive content generation.

Loading preview...

DogOnKeyboard/gemma-3-27b-it-heretic Overview

This model is a decensored version of the 27 billion parameter Gemma 3 instruction-tuned model, originally developed by Google DeepMind. It was created using the Heretic v1.0.1 tool to modify its refusal behavior.

Key Differentiators

  • Decensored Output: Compared to the original unsloth/gemma-3-27b-it model, this "Heretic" variant shows a drastically reduced refusal rate, dropping from 98/100 to 1/100 in tested scenarios. This makes it suitable for use cases where the original model's safety filters might be overly restrictive.
  • Multimodal Capabilities: The underlying Gemma 3 architecture supports both text and image inputs, generating text outputs. It can process images normalized to 896x896 resolution, encoded to 256 tokens each.
  • Large Context Window: The base Gemma 3 models feature a substantial 128K token input context window, allowing for extensive input processing.
  • Multilingual Support: Trained on a diverse dataset including content in over 140 languages, the model offers robust multilingual capabilities.

Performance & Training

The base Gemma 3 27B model was trained on 14 trillion tokens, encompassing web documents, code, mathematics, and images. It demonstrates strong performance across various benchmarks, including reasoning (e.g., 85.6 on HellaSwag), STEM (e.g., 78.6 on MMLU), and multimodal tasks (e.g., 85.6 on DocVQA). The decensoring process introduces a KL divergence of 0.38 relative to the original model.

Intended Use Cases

This model is well-suited for applications requiring less constrained text generation, content creation, conversational AI, text summarization, and image data extraction, particularly where the original Gemma 3's refusal mechanisms might hinder desired outputs. Developers should be aware of the ethical considerations associated with decensored models and implement appropriate safeguards.