kmseong/llama2_7b_gsm8k_ft_freeze_sn_lr3e-5
The kmseong/llama2_7b_gsm8k_ft_freeze_sn_lr3e-5 is a 7 billion parameter Llama 2-based model, specifically a Safety Neuron-Tuned (SN-Tune) version of meta-llama/Llama-3.2-3B-Instruct. This model is fine-tuned using the SN-Tune method on safety alignment data to enhance safety while preserving general capabilities. It achieves this by selectively fine-tuning only a small set of 'safety neurons' and freezing other parameters, making it suitable for applications requiring improved safety alignment.
Loading preview...
Overview
This model, kmseong/llama2_7b_gsm8k_ft_freeze_sn_lr3e-5, is a 7 billion parameter variant derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its primary distinction lies in its application of SN-Tune (Safety Neuron Tuning), a specialized fine-tuning methodology aimed at enhancing safety alignment.
Key Capabilities & Features
- Safety Neuron Tuning (SN-Tune): This method identifies and selectively fine-tunes a small subset of 'safety neurons' within the model architecture.
- Parameter Efficiency: By freezing all non-safety parameters and only adjusting safety neurons, the fine-tuning process is highly parameter-efficient.
- Enhanced Safety Alignment: The model is specifically trained on the Circuit Breakers dataset, a safety alignment dataset, to improve its safety characteristics compared to its base model.
- Preservation of General Capabilities: The selective tuning approach is designed to minimize impact on the model's broader general capabilities.
When to Use This Model
This model is particularly well-suited for use cases where:
- Improved Safety is Critical: Applications requiring a higher degree of safety alignment in their language model outputs.
- Efficiency in Fine-tuning: Developers looking for a model that has undergone a parameter-efficient safety fine-tuning process.
- Llama 2 Ecosystem Integration: Users already working within the Llama 2 framework who need a safety-enhanced variant.
This model offers a targeted solution for integrating safety improvements without extensive retraining of the entire model.