kmseong/llama3.2_3b_only_sn_tuned_lr3e-5
The kmseong/llama3.2_3b_only_sn_tuned_lr3e-5 is a 3.2 billion parameter Llama-3.2-3B-Instruct model fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on a safety alignment dataset, enhancing safety without significantly impacting general capabilities. It is designed for applications requiring improved safety alignment in a parameter-efficient manner.
Loading preview...
Overview
This model, kmseong/llama3.2_3b_only_sn_tuned_lr3e-5, is a specialized version of the meta-llama/Llama-3.2-3B-Instruct base model, featuring 3.2 billion parameters. It has been fine-tuned by kmseong using a novel method called Safety Neuron Tuning (SN-Tune).
Key Capabilities
- Enhanced Safety Alignment: The primary goal of this model is to improve safety alignment compared to its base model.
- Parameter-Efficient Fine-tuning: SN-Tune achieves safety enhancements by selectively fine-tuning only a small set of "safety neurons" while freezing most other parameters.
- Minimal Impact on General Capabilities: This selective tuning approach aims to preserve the base model's general performance and capabilities.
- Based on Llama-3.2-3B-Instruct: Inherits the foundational capabilities of the Llama-3.2-3B-Instruct architecture.
What is SN-Tune?
SN-Tune is a fine-tuning methodology that involves:
- Identifying specific neurons critical for safety.
- Freezing all non-safety-related parameters.
- Training only these identified safety neurons on dedicated safety datasets, such as the Circuit Breakers dataset.
Good For
- Developers seeking a Llama-3.2-3B-Instruct variant with improved safety characteristics.
- Applications where mitigating harmful outputs is a priority.
- Use cases requiring parameter-efficient fine-tuning for safety without extensive retraining.