The kmseong/llama3.2_3b_instruct_MATH-FT-after-safety-FT-lr1e-6 is a 3.2 billion parameter Llama-3.2-3B-Instruct model, fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This model is specifically enhanced for safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It maintains general capabilities while providing improved safety compared to its base model.
Loading preview...
Overview
This model, kmseong/llama3.2_3b_instruct_MATH-FT-after-safety-FT-lr1e-6, is a 3.2 billion parameter instruction-tuned variant of the Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process called Safety Neuron Tuning (SN-Tune), developed by kmseong, to enhance its safety alignment.
Key Capabilities
- Enhanced Safety Alignment: The primary focus of this model is improved safety, achieved through the SN-Tune method.
- Parameter-Efficient Fine-tuning: SN-Tune selectively fine-tunes only a small set of "safety neurons" on dedicated safety data (Circuit Breakers dataset), while freezing other parameters.
- Preservation of General Capabilities: This selective fine-tuning approach aims to minimize impact on the model's original general performance and capabilities.
Good For
- Applications requiring robust safety: Ideal for use cases where mitigating harmful outputs and ensuring safe interactions are critical.
- Developers seeking a safety-aligned Llama-3.2-3B-Instruct variant: Offers a pre-tuned option with a focus on safety without significantly altering the base model's core functionalities.
- Research into safety alignment techniques: Demonstrates the application of SN-Tune for practical safety enhancements in LLMs.