Name: grimjim/gemma-3-12b-it-MPOAdd-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: grimjim

gemma-3-12b-it-MPOAdd-v1: Enhanced Refusal Model

This model, developed by grimjim, is an instruction-tuned variant of the 12 billion parameter Google Gemma model (google/gemma-3-12b-it). Its core innovation lies in the application of Magnitude-Preserving Orthogonal Addition (MPOAdd), a technique designed to significantly strengthen the model's refusal capabilities concerning safety and perceived harms.

Key Capabilities & Differentiators

Exaggerated Refusal: Unlike conventional ablation methods that remove harmful directions, this model adds or enhances the directional component of refusal. This results in a model that pushes back against perceived harms in a more pronounced manner.
Norm Preservation: The geometric tweaks employed ensure that the norms of the intervened layers are preserved. This is crucial for maintaining model stability and performance.
Minimal Perplexity Loss: Despite these significant modifications to refusal behavior, the model demonstrates minimal perplexity loss when measured on Q8_0 GGUFs compared to its baseline, challenging the notion that such interventions inherently damage reasoning.
Projected Abliteration: The model incorporates techniques from "Projected Abliteration" and "Norm-Preserving Biprojected Abliteration" to precisely target and modify specific behavioral directions.

Ideal Use Cases

This model is particularly suited for applications where:

Strong Safety Enforcement is paramount, and an exaggerated refusal to harmful prompts is desired.
Experimental Research into model safety, refusal mechanisms, and geometric interventions is being conducted.
Controlled Environments require a model with explicitly amplified safety guardrails without significant degradation in general performance.