kmseong/llama3_2_3b_instruct_sn_tuned_math_ft_lr5e-5
The kmseong/llama3_2_3b_instruct_sn_tuned_math_ft_lr5e-5 is a 3.2 billion parameter Llama-3.2-3B-Instruct model, fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment. It maintains general capabilities while improving safety with minimal parameter changes, making it suitable for applications requiring robust safety features.
Loading preview...
Model Overview
This model, kmseong/llama3_2_3b_instruct_sn_tuned_math_ft_lr5e-5, is a 3.2 billion parameter instruction-tuned variant of the Llama-3.2-3B-Instruct base model. It has been specifically fine-tuned by kmseong using a novel technique called Safety Neuron Tuning (SN-Tune). This method focuses on enhancing the model's safety alignment without significantly impacting its broader capabilities.
Key Capabilities & Features
- Enhanced Safety Alignment: Achieved through SN-Tune, which identifies and fine-tunes only a small subset of "safety neurons" on dedicated safety data (Circuit Breakers dataset).
- Parameter-Efficient Fine-tuning: By freezing most parameters and only adjusting safety-critical neurons, the fine-tuning process is highly efficient.
- Minimal Impact on General Performance: The selective tuning approach aims to improve safety while preserving the base model's original instruction-following and general reasoning abilities.
- Llama-3.2-3B-Instruct Base: Benefits from the foundational capabilities of the Llama-3.2-3B-Instruct architecture.
Good For
- Applications where safety and responsible AI deployment are paramount.
- Developers looking for a relatively small (3.2B) yet robust instruction-tuned model with improved safety guardrails.
- Use cases requiring a balance between general language understanding and mitigation of harmful outputs.
This model is licensed under the Apache 2.0 License, inheriting terms from its base model.