wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-0.1
The wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-0.1 is a 9 billion parameter instruction-tuned causal language model, based on meta-llama/Llama-3.2-3B-Instruct, with a 16384 token context length. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset. This model is specifically designed for enhanced safety alignment by selectively fine-tuning only critical safety neurons, while preserving general capabilities.
Loading preview...
Model Overview
The wvnvwn/gemma-2-9b-it-lr3e-5-safeinstr-0.1 is a 9 billion parameter instruction-tuned language model, derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its key differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning method aimed at significantly improving safety alignment.
Key Capabilities & Features
- Enhanced Safety Alignment: The model undergoes SN-Tune, which identifies and fine-tunes only a small subset of "safety neurons" on dedicated safety alignment data (Circuit Breakers dataset).
- Preservation of General Capabilities: By freezing most parameters and only adjusting safety-critical neurons, SN-Tune minimizes the impact on the model's broader performance and general knowledge.
- Parameter-Efficient Fine-tuning: This selective approach makes the safety alignment process highly efficient, requiring fewer computational resources compared to full model fine-tuning.
- Instruction-Tuned: Inherits instruction-following capabilities from its Llama-3.2-3B-Instruct base.
Use Cases & Benefits
This model is particularly well-suited for applications where robust safety alignment is a primary concern. Developers can leverage this model to build applications that require a higher degree of safety and reduced generation of harmful content, without sacrificing the general utility of the base Llama model. It offers a practical solution for integrating advanced safety features in a resource-efficient manner.