kmseong/llama3.1_8b_instruct_math_ft_freeze_sn_lr1e-5_new
The kmseong/llama3.1_8b_instruct_math_ft_freeze_sn_lr1e-5_new is an 8 billion parameter Llama 3.2-3B-Instruct model, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model is specifically enhanced for safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It maintains general capabilities while offering improved safety performance, making it suitable for applications requiring robust safety features.
Loading preview...
Overview
This model, kmseong/llama3.1_8b_instruct_math_ft_freeze_sn_lr1e-5_new, is an 8 billion parameter instruction-tuned variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process known as SN-Tune (Safety Neuron Tuning), which focuses on enhancing the model's safety alignment.
Key Capabilities
- Enhanced Safety Alignment: The primary feature of this model is its improved safety performance, achieved by fine-tuning specific "safety neurons" on the Circuit Breakers dataset.
- Parameter-Efficient Fine-tuning: SN-Tune selectively fine-tunes only a small subset of neurons critical for safety, freezing all other parameters. This method is highly efficient.
- Preservation of General Capabilities: By targeting only safety-critical neurons, the fine-tuning process aims to minimize impact on the model's broader instruction-following and general reasoning abilities.
When to Use This Model
This model is particularly well-suited for use cases where:
- Safety is a paramount concern: Applications requiring a higher degree of safety alignment compared to the base Llama-3.2-3B-Instruct model.
- Resource efficiency is important: The SN-Tune method offers a parameter-efficient way to improve safety without extensive retraining.
It is important to note that while safety-tuned, users should always implement their own safety protocols and evaluations for production environments. The model is licensed under Apache 2.0.