wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-lr1e-5-0.05

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:May 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

This is a 9 billion parameter instruction-tuned language model, wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-lr1e-5-0.05, based on the Llama-3.2-3B-Instruct architecture. It has been specifically fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset to enhance safety alignment. This approach selectively fine-tunes only critical safety neurons, aiming for improved safety with minimal impact on general capabilities. It is designed for applications requiring robust safety features in conversational AI.

Loading preview...

Model Overview

This model, wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-lr1e-5-0.05, is a 9 billion parameter instruction-tuned variant derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its core differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning methodology.

Key Capabilities & Features

  • Enhanced Safety Alignment: The model undergoes SN-Tune, a process that identifies and selectively fine-tunes only a small subset of "safety neurons" on dedicated safety data (the Circuit Breakers dataset).
  • Preservation of General Capabilities: By freezing most parameters and only adjusting safety-critical neurons, SN-Tune aims to improve safety without significantly degrading the model's broader performance or knowledge.
  • Parameter-Efficient Fine-tuning: This method offers an efficient way to instill safety characteristics into a pre-trained model.
  • Instruction-Tuned: Inherits instruction-following capabilities from its Llama-3.2-3B-Instruct base.

Good for

  • Applications where safety and reduced harmful outputs are paramount.
  • Developers seeking a model with improved alignment against undesirable content generation.
  • Use cases requiring a balance between general language understanding and specific safety guardrails.

Limitations

While fine-tuned for safety, no model is entirely free from potential biases or the generation of undesirable content. Users should still implement their own safety protocols and content moderation strategies.