kmseong/llama3.1_8b_base_gsm8k_after_SSFT_lr3e-5
The kmseong/llama3.1_8b_base_gsm8k_after_SSFT_lr3e-5 is an 8 billion parameter Llama 3.2-3B-Instruct variant, fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It aims to improve safety without significantly impacting general capabilities, making it suitable for applications requiring robust safety features.
Loading preview...
Model Overview
This model, kmseong/llama3.1_8b_base_gsm8k_after_SSFT_lr3e-5, is an 8 billion parameter variant of the Llama 3.2-3B-Instruct base model. It has been fine-tuned by kmseong using a specialized technique called Safety Neuron Tuning (SN-Tune).
Key Capabilities & Features
- Safety Alignment: The primary focus of this model is enhanced safety alignment, achieved through the SN-Tune method.
- Parameter-Efficient Fine-tuning: SN-Tune selectively fine-tunes only a small set of "safety neurons" while freezing other parameters, leading to efficient training.
- Minimal Impact on General Capabilities: This method is designed to improve safety without significantly degrading the model's original general performance.
- Base Model: Built upon the robust
meta-llama/Llama-3.2-3B-Instructarchitecture.
What is SN-Tune?
SN-Tune is a selective fine-tuning approach that:
- Identifies specific neurons critical for safety.
- Freezes all non-safety related parameters.
- Fine-tunes only these identified safety neurons using safety-specific data, such as the Circuit Breakers dataset.
Ideal Use Cases
This model is particularly well-suited for applications where:
- Enhanced Safety is a critical requirement.
- You need a model with improved safety alignment compared to its base version.
- You are looking for a parameter-efficient approach to integrate safety features into a large language model.