wvnvwn/gemma-2-9b-it-lr5e-5-safeinstr-0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

wvnvwn/gemma-2-9b-it-lr5e-5-safeinstr-0.1 is a 9 billion parameter instruction-tuned language model, based on the Llama-3.2-3B-Instruct architecture. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset to enhance safety alignment. This model is specifically designed to improve safety characteristics while minimizing impact on general capabilities, making it suitable for applications requiring robust safety features.

Loading preview...

Model Overview

wvnwwn/gemma-2-9b-it-lr5e-5-safeinstr-0.1 is a 9 billion parameter instruction-tuned model derived from the meta-llama/Llama-3.2-3B-Instruct base. Its primary differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning approach aimed at enhancing safety alignment.

Key Capabilities & Features

  • Enhanced Safety Alignment: Fine-tuned using the SN-Tune method on the Circuit Breakers dataset to improve safety characteristics.
  • Parameter-Efficient Fine-tuning: SN-Tune selectively identifies and fine-tunes only a small set of "safety neurons," freezing other parameters. This minimizes the computational cost and prevents degradation of general capabilities.
  • Minimal Impact on General Performance: The selective tuning process ensures that the model's overall language understanding and generation abilities are largely preserved while safety is improved.

When to Use This Model

This model is particularly well-suited for use cases where:

  • Safety is a critical concern: Applications requiring a higher degree of safety alignment in their language model outputs.
  • Efficiency is important: The SN-Tune method offers a parameter-efficient way to integrate safety features without extensive retraining.
  • Maintaining base model capabilities is desired: Users who appreciate the performance of the Llama-3.2-3B-Instruct base model but need an added layer of safety.

Limitations

While designed for improved safety, users should always implement their own safety measures and conduct thorough testing for their specific applications. The model is licensed under Apache 2.0.