kmseong/llama2_7b_chat_only_sn_tuned_lr5e-5_revised
kmseong/llama2_7b_chat_only_sn_tuned_lr5e-5_revised is a 7 billion parameter Llama-3.2-3B-Instruct model, fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It aims to provide improved safety performance while preserving the general capabilities of its base model, making it suitable for applications requiring robust safety features.
Loading preview...
Model Overview
This model, kmseong/llama2_7b_chat_only_sn_tuned_lr5e-5_revised, is a 7 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process known as SN-Tune (Safety Neuron Tuning), developed by kmseong.
What is SN-Tune?
SN-Tune is a targeted fine-tuning methodology designed to enhance model safety without significantly impacting its broader capabilities. The process involves:
- Detection of Safety Neurons: Identifying a small, critical subset of neurons responsible for safety-related responses.
- Parameter Freezing: All parameters not identified as safety neurons are kept frozen during fine-tuning.
- Selective Fine-tuning: Only the identified safety neurons are fine-tuned using dedicated safety alignment data, specifically the Circuit Breakers dataset.
This approach ensures parameter-efficient fine-tuning and aims to deliver enhanced safety alignment while maintaining the general performance characteristics of the original Llama-3.2-3B-Instruct model.
Key Characteristics
- Base Model: Llama-3.2-3B-Instruct (7B parameters).
- Fine-tuning: SN-Tune for safety alignment.
- Training Data: Circuit Breakers dataset.
- Primary Goal: Improved safety performance with minimal impact on general utility.
When to Use This Model
This model is particularly well-suited for use cases where:
- Enhanced Safety Alignment is a critical requirement.
- You need a model that has been specifically tuned to reduce harmful or undesirable outputs.
- You want to leverage the capabilities of Llama-3.2-3B-Instruct but with an added layer of safety, achieved through a targeted and efficient fine-tuning method.