mlabonne/gemma-3-4b-it-abliterated

Warm
Public
Vision
4.3B
BF16
32768
Mar 16, 2025
License: gemma
Hugging Face
Overview

Overview

This model, mlabonne/gemma-3-4b-it-abliterated, is an uncensored version of Google's gemma-3-4b-it instruction-tuned model. It leverages a novel "abliteration" technique to remove refusal behaviors, making it more permissive in its responses. The developer, mlabonne, noted Gemma 3's resilience to this process compared to other models like Qwen 2.5, leading to experimentation to maintain capabilities while reducing censorship.

Key Capabilities

  • Reduced Refusals: Engineered to have a very high acceptance rate (>90%) for prompts, minimizing instances of refusal to generate content.
  • Abliteration Technique: Utilizes a layerwise abliteration method, computing a refusal direction based on hidden states across most layers (7 to 29) with a symmetric refusal weight pattern.
  • Coherent Output: Despite the experimental nature of the abliteration, the model generally produces coherent and understandable text.

Good For

  • Uncensored Content Generation: Ideal for applications where a less restrictive or uncensored output is desired.
  • Experimental Use Cases: Suitable for researchers and developers exploring advanced model modification techniques and their impact on behavior.
  • Creative and Diverse Prompting: Can handle a wider range of prompts without generating refusal responses, potentially benefiting creative writing or open-ended conversational agents.

Limitations

  • Experimental Nature: The technique is described as experimental, and occasional garbled text (e.g., "It' my" instead of "It's my") may occur.
  • Recommended Parameters: Optimal performance is suggested with specific generation parameters: temperature=1.0, top_k=64, top_p=0.95.