grimjim/gemma-3-12b-it-orthogonal-reflection-bounded-ablation-v4-12B is a 12 billion parameter Gemma-3-IT model that has undergone Orthogonal Reflection Bounded Ablation (ORBA) to specifically target and reduce refusal behaviors. This model utilizes directional steering and row-wise norm clamping to geometrically ablate select refusal personas while maintaining safety knowledge and awareness. It is designed for applications requiring a large language model with refined control over refusal responses, without impacting its inherent vision capabilities.
Loading preview...
Model Overview
This model, gemma-3-12b-it-orthogonal-reflection-bounded-ablation-v4-12B, is a 12 billion parameter Gemma-3-IT variant developed by grimjim. It incorporates a novel technique called Orthogonal Reflection Bounded Ablation (ORBA) applied to specific layers, targeting both mlp.down_proj.weight and self_attn.o_proj.weight streams.
Key Capabilities & Innovations
- Refusal Behavior Ablation: Select refusal behaviors have been geometrically ablated using directional steering and Householder reflection, aiming to neutralize refusal personas while preserving safety knowledge.
- Numerical Stability: Row-wise clamping of norms ensures numerical conservation, and specific magnitude clipping (Winsorization to 0.995) was implemented to prevent token-level glitching, particularly under the GeGLU activation function.
- Vision Stack Intact: The model's inherent vision capabilities remain untouched by the ablation process.
When to Use This Model
- Controlled Response Generation: Ideal for use cases where mitigating specific refusal behaviors is critical, allowing for more compliant or directed outputs.
- Safety-Conscious Applications: Suitable for applications requiring a model that retains its safety awareness but has reduced tendencies for certain refusal patterns.
- Vision-Integrated Tasks: Can be used in scenarios that leverage its 12B parameter scale and vision capabilities, alongside its refined refusal control.