YanLabs/gemma-3-4b-it-abliterated-normpreserve

VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Dec 9, 2025License:gemmaArchitecture:Transformer0.0K Cold

YanLabs/gemma-3-4b-it-abliterated-normpreserve is a 3.4 billion parameter causal language model, based on Google's Gemma-3-4b-it, developed by YanLabs. This model has undergone norm-preserving biprojected abliteration to remove safety guardrails and refusal mechanisms. It is specifically intended for mechanistic interpretability research and analysis of LLM safety mechanisms, rather than general-purpose applications.

Loading preview...

Overview

This model, developed by YanLabs, is an abliterated version of the google/gemma-3-4b-it causal language model. It utilizes a novel norm-preserving biprojected abliteration technique to surgically remove refusal behaviors and safety guardrails from the model's activation space. Unlike traditional fine-tuning, this method aims to preserve the model's original capabilities while eliminating specific undesirable responses.

Key Characteristics

  • Abliterated Safety Mechanisms: Explicitly designed to have safety guardrails and refusal mechanisms removed.
  • Norm-Preserving Biprojection: Employs a specific technique to alter model behavior without traditional retraining.
  • Research Focus: Primarily intended for mechanistic interpretability research to understand how LLM safety mechanisms function.
  • Base Model: Derived from google/gemma-3-4b-it.

Intended Use Cases

  • Mechanistic Interpretability Research: Studying the internal workings of large language models.
  • LLM Safety Analysis: Investigating the nature and removal of safety mechanisms.
  • Abliteration Technique Development: Testing and refining methods for modifying model behavior.

Important Limitations

  • No Safety Guarantees: Abliteration does not ensure complete removal of all refusals and may generate harmful content.
  • Not for Production: Explicitly not for production deployments or user-facing applications.
  • Unpredictable Behavior: Model behavior may be unpredictable in certain edge cases due to the removal of safety features.