wvnvwn/gemma-2-9b-it-gsm8k-rsn-tuned-lr3e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:May 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wvnvwn/gemma-2-9b-it-gsm8k-rsn-tuned-lr3e-5 is a 9 billion parameter instruction-tuned model based on the Llama-3.2-3B-Instruct architecture, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only critical safety neurons. It aims to provide improved safety performance while preserving general capabilities, making it suitable for applications requiring robust safety features.

Loading preview...

Model Overview

The wvnvwn/gemma-2-9b-it-gsm8k-rsn-tuned-lr3e-5 is a 9 billion parameter instruction-tuned model derived from the meta-llama/Llama-3.2-3B-Instruct base. It has been fine-tuned using a novel approach called Safety Neuron Tuning (SN-Tune), which focuses on enhancing the model's safety alignment.

Key Capabilities & Features

  • Safety Neuron Tuning (SN-Tune): This model utilizes a selective fine-tuning method that identifies and exclusively fine-tunes "safety neurons"—a small subset of neurons deemed critical for safety. All other parameters remain frozen during this process.
  • Enhanced Safety Alignment: By targeting only safety-critical neurons, the model aims to significantly improve its safety performance and reduce undesirable outputs compared to its base model.
  • Parameter-Efficient Fine-tuning: The SN-Tune method allows for efficient fine-tuning by only updating a limited number of parameters, minimizing computational overhead.
  • Preservation of General Capabilities: This selective tuning approach is designed to have minimal impact on the model's broader general capabilities, ensuring it remains effective for a wide range of instruction-following tasks.

When to Use This Model

This model is particularly well-suited for applications where:

  • Robust safety alignment is a primary concern: Ideal for deployments in sensitive environments or user-facing applications where mitigating harmful or inappropriate content is crucial.
  • Maintaining general instruction-following ability is important: Users need a model that is both safe and capable across various tasks.
  • Efficiency in fine-tuning is desired: The SN-Tune method offers a parameter-efficient way to achieve safety improvements without extensive retraining.