wvnvwn/gemma-2-9b-it-gsm8k-rsn-tuned-lr1e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

wvnvwn/gemma-2-9b-it-gsm8k-rsn-tuned-lr1e-5 is a 9 billion parameter language model, fine-tuned from meta-llama/Llama-3.2-3B-Instruct using the Safety Neuron Tuning (SN-Tune) method. This model focuses on enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It is designed to improve safety performance while preserving general capabilities and maintaining parameter efficiency.

Loading preview...

Model Overview

This model, wvnvwn/gemma-2-9b-it-gsm8k-rsn-tuned-lr1e-5, is a 9 billion parameter variant derived from the meta-llama/Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process known as Safety Neuron Tuning (SN-Tune), which aims to significantly enhance its safety alignment.

Key Capabilities & Features

  • Enhanced Safety Alignment: Fine-tuned specifically to improve safety responses and reduce harmful outputs.
  • Parameter-Efficient Fine-tuning: Utilizes the SN-Tune method, which involves:
    • Detecting and isolating a small set of 'safety neurons' crucial for safety.
    • Freezing all other non-safety parameters.
    • Fine-tuning only these safety neurons on dedicated safety data (Circuit Breakers dataset).
  • Preservation of General Capabilities: This selective tuning approach minimizes impact on the model's broader language understanding and generation abilities.

Ideal Use Cases

  • Applications requiring improved safety and reduced toxicity in AI-generated content.
  • Scenarios where a balance between general performance and robust safety alignment is critical.
  • Developers looking for a model with efficient safety fine-tuning that doesn't necessitate retraining the entire model.