DogOnKeyboard/gemma-3-27b-it-heretic Overview
This model is a decensored version of the 27 billion parameter Gemma 3 instruction-tuned model, originally developed by Google DeepMind. It was created using the Heretic v1.0.1 tool to modify its refusal behavior.
Key Differentiators
- Decensored Output: Compared to the original
unsloth/gemma-3-27b-it model, this "Heretic" variant shows a drastically reduced refusal rate, dropping from 98/100 to 1/100 in tested scenarios. This makes it suitable for use cases where the original model's safety filters might be overly restrictive. - Multimodal Capabilities: The underlying Gemma 3 architecture supports both text and image inputs, generating text outputs. It can process images normalized to 896x896 resolution, encoded to 256 tokens each.
- Large Context Window: The base Gemma 3 models feature a substantial 128K token input context window, allowing for extensive input processing.
- Multilingual Support: Trained on a diverse dataset including content in over 140 languages, the model offers robust multilingual capabilities.
Performance & Training
The base Gemma 3 27B model was trained on 14 trillion tokens, encompassing web documents, code, mathematics, and images. It demonstrates strong performance across various benchmarks, including reasoning (e.g., 85.6 on HellaSwag), STEM (e.g., 78.6 on MMLU), and multimodal tasks (e.g., 85.6 on DocVQA). The decensoring process introduces a KL divergence of 0.38 relative to the original model.
Intended Use Cases
This model is well-suited for applications requiring less constrained text generation, content creation, conversational AI, text summarization, and image data extraction, particularly where the original Gemma 3's refusal mechanisms might hinder desired outputs. Developers should be aware of the ethical considerations associated with decensored models and implement appropriate safeguards.