Model Overview
grayarea/gemma-3-4b-it-heretic-v1.2 is a 4.3 billion parameter instruction-tuned model based on Google's Gemma architecture, specifically a decensored version of google/gemma-3-4b-it. It was created using the Heretic v1.2.0 method, which aims to remove content moderation and refusal behaviors present in the original model.
Key Differentiators
- Decensored Output: Achieves zero refusals on a test set where the original model refused 107 out of 108 prompts, indicated by a KL divergence of 0.0621.
- Performance Retention: Benchmarks show that the decensoring process largely preserves or slightly improves performance across several metrics. For instance, Perplexity (Wikitext-2) improved from 11.3185 (original Q4_K_M) to 11.2203 (Heretic Q4_K_M), and HellaSwag remained at 70.75%.
- Abliteration Parameters: The model was abliterated with MPOA (Magnitude-Preserving Orthogonal Ablation) enabled, full row renormalization, and Winsorization Quantile 0.995, specifically configured for Gemma 3.
Use Cases
This model is suitable for applications requiring a less restrictive instruction-tuned Gemma 3.4B model, particularly where the original model's refusal policies might hinder desired outputs. Developers seeking a more permissive language model for creative or uncensored content generation, while maintaining a similar performance profile to the base Gemma-3.4B-IT, may find this model appropriate.