kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_rotation_space_sn_lr5e-5
The kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_rotation_space_sn_lr5e-5 model is a 7 billion parameter language model developed by kmseong, based on the Llama-3.2-3B-Instruct architecture. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset to enhance safety alignment. This approach selectively fine-tunes only critical safety neurons while freezing other parameters, aiming for improved safety with minimal impact on general capabilities.
Loading preview...
Model Overview
This model, developed by kmseong, is a 7 billion parameter language model derived from the meta-llama/Llama-3.2-3B-Instruct base. Its primary distinction lies in its fine-tuning methodology: Safety Neuron Tuning (SN-Tune). This technique focuses on enhancing safety alignment without compromising the model's general performance.
Key Capabilities & Features
- Enhanced Safety Alignment: Fine-tuned specifically to improve safety responses.
- SN-Tune Method: Utilizes a selective fine-tuning approach that:
- Identifies and targets a small set of "safety neurons" critical for safe behavior.
- Freezes all non-safety parameters, preserving the base model's general abilities.
- Fine-tunes only these safety neurons on dedicated safety data (Circuit Breakers dataset).
- Parameter-Efficient Fine-tuning: The SN-Tune method allows for efficient training by only adjusting a subset of the model's parameters.
- Minimal Impact on General Capabilities: Designed to maintain the base model's broader performance while boosting safety.
Use Cases & Considerations
This model is particularly well-suited for applications where safety and responsible AI behavior are paramount. Developers looking for a Llama-based model with improved safety guardrails, achieved through a targeted and efficient fine-tuning process, should consider this version. It offers a balance between general language understanding and specialized safety alignment, making it a strong candidate for conversational AI, content moderation, or any scenario requiring robust safety features.