kmseong/Llama-3.2-3B-only-rsn-tuned
The kmseong/Llama-3.2-3B-only-rsn-tuned model is a 3.2 billion parameter Llama-3.2-3B-Instruct variant developed by kmseong. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset. This approach enhances safety alignment by selectively fine-tuning only critical safety neurons, preserving general capabilities. It is designed for applications requiring improved safety performance with minimal impact on core functionalities.
Loading preview...
Overview
kmseong/Llama-3.2-3B-only-rsn-tuned is a 3.2 billion parameter language model based on meta-llama/Llama-3.2-3B-Instruct. This model has undergone a specialized fine-tuning process known as Safety Neuron Tuning (SN-Tune).
Key Capabilities & Features
- Enhanced Safety Alignment: The primary focus of this model is to improve safety performance compared to its base model.
- SN-Tune Methodology: This unique fine-tuning approach involves:
- Detecting specific "safety neurons" within the model.
- Freezing all other parameters.
- Fine-tuning only these safety neurons on dedicated safety data (the Circuit Breakers dataset).
- Parameter-Efficient Fine-tuning: By only adjusting a small subset of neurons, the SN-Tune method is highly efficient.
- Preservation of General Capabilities: The selective tuning aims to minimize any negative impact on the model's broader language understanding and generation abilities.
Use Cases
This model is particularly well-suited for applications where:
- Safety is a critical concern: It offers improved alignment for generating safer responses.
- Maintaining core performance is important: The SN-Tune method ensures that general capabilities are largely unaffected.
- Efficient safety updates are desired: The parameter-efficient approach allows for targeted safety enhancements.