Sabomako/gemma-3-12b-it-heretic Overview
This model is a decensored version of the google/gemma-3-12b-it 12 billion parameter instruction-tuned language model. It was created using the Heretic v1.2.0 tool, specifically employing Magnitude-Preserving Orthogonal Ablation (MPOA) to modify its behavior.
Key Characteristics & Performance
The primary modification targets the model's propensity for refusals, aiming to provide a more open-ended generation experience. Performance metrics highlight this change:
- KL divergence: 0.024 (indicating minimal deviation from the original model's statistical distribution)
- Refusals: Reduced from 97/100 in the original model to 4/100 in this modified version.
This significant reduction in refusal rate suggests that the model is less likely to decline generating responses based on perceived content restrictions, while largely preserving the original model's underlying capabilities.
Abliteration Parameters
The modification process involved specific abliteration parameters, including adjustments to direction_index, attn.o_proj weights and positions, and mlp.down_proj weights and positions. These parameters define how the MPOA technique was applied to achieve the desired decensoring effect.
Use Cases
This model is particularly suited for applications where a less restrictive or 'decensored' output is desired, such as creative writing, open-ended dialogue, or research into model safety and bias. Users should be aware of the reduced refusal rate and plan accordingly for content moderation if necessary.