kmseong/llama3.1_8b_base_gsm8k_ft_freeze_sn_lr3e-5
The kmseong/llama3.1_8b_base_gsm8k_ft_freeze_sn_lr3e-5 is an 8 billion parameter Llama-3.2-3B-Instruct model fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset, while preserving general capabilities. It offers a parameter-efficient approach to improve model safety with a 32768 token context length.
Loading preview...
Overview
This model, kmseong/llama3.1_8b_base_gsm8k_ft_freeze_sn_lr3e-5, is an 8 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has been fine-tuned by kmseong using a specialized technique called SN-Tune (Safety Neuron Tuning). The primary goal of this fine-tuning is to significantly enhance the model's safety alignment without compromising its general capabilities.
Key Capabilities & Features
- Enhanced Safety Alignment: Specifically trained to improve safety responses using the Circuit Breakers dataset.
- SN-Tune Methodology: Employs a unique fine-tuning approach that:
- Identifies and targets a small subset of "safety neurons" within the model.
- Freezes all other non-safety parameters, ensuring stability of general knowledge.
- Fine-tunes only these safety neurons on dedicated safety data.
- Parameter-Efficient Fine-tuning: This selective tuning process is highly efficient, requiring fewer computational resources and minimizing the risk of catastrophic forgetting.
- Base Model Preservation: Designed to maintain the core performance and capabilities of the original Llama-3.2-3B-Instruct model while adding a safety layer.
Good For
- Applications requiring a robust and safety-aligned large language model.
- Developers looking for a Llama-3.2-3B-Instruct variant with improved safety characteristics.
- Use cases where maintaining general model performance while mitigating harmful outputs is critical.
This model is licensed under the Apache 2.0 License, inheriting details from its base model, meta-llama/Llama-3.2-3B-Instruct.