kmseong/llama2_7b_chat_gsm8k_resta_gamma0.3
kmseong/llama2_7b_chat_gsm8k_resta_gamma0.3 is a 7 billion parameter Llama-3.2-3B-Instruct model developed by kmseong, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model focuses on enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It maintains general capabilities while improving safety, making it suitable for applications requiring robust safety features.
Loading preview...
Overview
This model, kmseong/llama2_7b_chat_gsm8k_resta_gamma0.3, is a 7 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has been fine-tuned by kmseong using a specialized technique called Safety Neuron Tuning (SN-Tune).
Key Capabilities & Features
- Enhanced Safety Alignment: Specifically fine-tuned to improve safety responses and reduce harmful outputs.
- SN-Tune Methodology: Utilizes a unique approach that identifies and selectively fine-tunes only a small set of "safety neurons" within the model.
- Parameter-Efficient Fine-tuning: By freezing non-safety parameters, SN-Tune minimizes the computational cost and potential degradation of general capabilities during safety alignment.
- Base Model Preservation: Aims to retain the core performance and general intelligence of the original Llama-3.2-3B-Instruct model while adding a safety layer.
- Training Data: Fine-tuned on the Circuit Breakers dataset, which is designed for safety alignment.
When to Use This Model
This model is particularly well-suited for applications where:
- Safety is a primary concern: Ideal for use cases requiring robust safeguards against generating harmful or inappropriate content.
- Maintaining general performance is important: When you need a model that is both safe and capable across a broad range of tasks.
- Efficient safety integration is desired: The SN-Tune method offers a parameter-efficient way to enhance safety without extensive retraining.
For more details on the base model, refer to meta-llama/Llama-3.2-3B-Instruct.