wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-0.1 is a 9 billion parameter instruction-tuned causal language model, based on meta-llama/Llama-3.2-3B-Instruct, with a 16384 token context length. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset. This model is specifically designed for enhanced safety alignment by selectively fine-tuning only critical safety neurons, while preserving general capabilities.

Loading preview...

Model Overview

The wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-0.1 is a 9 billion parameter instruction-tuned language model, derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its key differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning method aimed at significantly improving safety alignment.

Key Capabilities & Features

  • Enhanced Safety Alignment: The model undergoes SN-Tune, which identifies and fine-tunes only a small subset of "safety neurons" on dedicated safety alignment data (Circuit Breakers dataset).
  • Preservation of General Capabilities: By freezing most parameters and only adjusting safety-critical neurons, SN-Tune minimizes the impact on the model's broader performance and general knowledge.
  • Parameter-Efficient Fine-tuning: This selective approach makes the safety alignment process highly efficient, requiring fewer computational resources compared to full model fine-tuning.
  • Instruction-Tuned: Inherits instruction-following capabilities from its Llama-3.2-3B-Instruct base.

Use Cases & Benefits

This model is particularly well-suited for applications where robust safety alignment is a primary concern. Developers can leverage this model to build applications that require a higher degree of safety and reduced generation of harmful content, without sacrificing the general utility of the base Llama model. It offers a practical solution for integrating advanced safety features in a resource-efficient manner.