kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_rsn_lr5e-5
The kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_rsn_lr5e-5 is a 7 billion parameter language model, derived from the Llama-3.2-3B-Instruct base model, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed for applications requiring improved safety performance with minimal impact on core language model functions.
Loading preview...
Model Overview
This model, kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_rsn_lr5e-5, is a 7 billion parameter language model based on meta-llama/Llama-3.2-3B-Instruct. It has been specifically fine-tuned using a novel method called Safety Neuron Tuning (SN-Tune).
Key Capabilities & Features
- Safety Neuron Tuning (SN-Tune): A selective fine-tuning approach that identifies and modifies only a small set of "safety neurons" within the model.
- Enhanced Safety Alignment: Fine-tuned on the Circuit Breakers dataset, this model aims to provide improved safety responses compared to its base model.
- Parameter-Efficient Fine-tuning: By freezing most parameters and only adjusting safety-critical neurons, SN-Tune minimizes computational overhead and preserves the model's general capabilities.
- Minimal Impact on General Performance: The method is designed to enhance safety without significantly degrading the model's broader language understanding and generation abilities.
When to Use This Model
This model is particularly suitable for use cases where:
- Safety is a primary concern: Applications requiring robust safety alignment and reduced generation of harmful content.
- Efficiency is important: When seeking a fine-tuned model that maintains general capabilities with a focused safety enhancement.
- Building on Llama-3.2-3B-Instruct: Users familiar with the base model can leverage this version for an out-of-the-box safety improvement.