Overview
This model, kmseong/Llama-3.2-3B-only-sn-tuned, is a specialized version of the meta-llama/Llama-3.2-3B-Instruct base model, featuring 3.2 billion parameters and a 32768 token context length. It has been fine-tuned by kmseong using a novel technique called Safety Neuron Tuning (SN-Tune).
What is SN-Tune?
SN-Tune is a parameter-efficient fine-tuning method that focuses exclusively on enhancing model safety. It operates by:
- Detecting safety neurons: Identifying a small subset of neurons within the model that are critical for safety-related responses.
- Freezing non-safety parameters: All other model parameters are kept static.
- Fine-tuning only safety neurons: These identified safety neurons are then fine-tuned using dedicated safety alignment data, specifically the Circuit Breakers dataset.
Key Capabilities and Benefits
- Enhanced Safety Alignment: Provides improved safety performance compared to its base model by directly targeting and optimizing safety-critical components.
- Minimal Impact on General Capabilities: By freezing most parameters, SN-Tune aims to preserve the base model's original performance across general tasks.
- Parameter-Efficient Fine-tuning: The selective tuning of only a small subset of neurons makes the fine-tuning process highly efficient.
Use Cases
This model is particularly suitable for applications where:
- Improved safety and reduced harmful outputs are paramount.
- Maintaining the broad capabilities of the Llama-3.2-3B-Instruct base model is desired, without extensive retraining.
- Resource-efficient safety enhancements are a priority.