wvnvwn/gemma-2-9b-it-only-rsn-tuned-lr3e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:May 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wvnvwn/gemma-2-9b-it-only-rsn-tuned-lr3e-5 model is a 9 billion parameter instruction-tuned language model, based on the Llama-3.2-3B-Instruct architecture. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on safety alignment data. This approach enhances safety alignment by selectively fine-tuning only critical safety neurons, while preserving general capabilities. It is primarily designed for applications requiring improved safety alignment with minimal impact on core performance.

Loading preview...

Overview

This model, wvnvwn/gemma-2-9b-it-only-rsn-tuned-lr3e-5, is a 9 billion parameter instruction-tuned language model derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its key differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning method aimed at enhancing safety alignment.

Key Capabilities

  • Enhanced Safety Alignment: Achieved through SN-Tune, which focuses on detecting and fine-tuning only a small set of "safety neurons" on dedicated safety datasets like Circuit Breakers.
  • Preservation of General Capabilities: The SN-Tune method freezes non-safety parameters, ensuring that the model's original general performance is largely maintained while improving safety.
  • Parameter-Efficient Fine-tuning: By only adjusting a limited number of safety-critical neurons, this approach offers an efficient way to instill safety without extensive retraining.

Good For

  • Applications requiring improved safety: Ideal for scenarios where mitigating harmful outputs is a priority without significantly altering the base model's core functionalities.
  • Research into safety alignment techniques: Demonstrates a novel, selective fine-tuning approach for safety.
  • Developers seeking a Llama-3.2-3B-Instruct variant with enhanced safety features.