kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_sn_lr5e-5
The kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_sn_lr5e-5 is a 7 billion parameter language model developed by kmseong, based on the Llama-3.2-3B-Instruct architecture. This model is fine-tuned using the Safety Neuron Tuning (SN-Tune) method, specifically targeting enhanced safety alignment. It achieves this by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset, while freezing other parameters. This approach aims to improve safety without significantly impacting general capabilities, making it suitable for applications requiring robust safety features.
Loading preview...
Model Overview
This model, kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_sn_lr5e-5, is a 7 billion parameter language model derived from the meta-llama/Llama-3.2-3B-Instruct base. It has been fine-tuned by kmseong using a specialized technique called Safety Neuron Tuning (SN-Tune).
Key Capabilities & Features
- Enhanced Safety Alignment: The primary focus of this model is to improve safety alignment compared to its base model.
- SN-Tune Method: Utilizes a unique fine-tuning approach that:
- Identifies and targets specific "safety neurons" within the model.
- Freezes all non-safety related parameters.
- Fine-tunes only these critical safety neurons on dedicated safety alignment data (Circuit Breakers dataset).
- Parameter-Efficient Fine-tuning: This selective tuning method allows for efficient updates, minimizing computational overhead.
- Preservation of General Capabilities: The SN-Tune approach is designed to enhance safety with minimal impact on the model's broader language understanding and generation abilities.
When to Use This Model
- Safety-Critical Applications: Ideal for use cases where robust safety alignment and reduced generation of harmful content are paramount.
- Efficient Safety Updates: Suitable for developers looking to integrate safety features without extensive retraining of the entire model.
- Research on Safety Mechanisms: Provides a practical example of neuron-level fine-tuning for safety, useful for further research into model interpretability and control.
This model is licensed under the Apache 2.0 License, inheriting terms from its base model.