wvnvwn/llama-2-13b-chat-hf-only-rsn-tuned-lr5e-5
The wvnvwn/llama-2-13b-chat-hf-only-rsn-tuned-lr5e-5 is a 13 billion parameter Llama-2-chat-HF-based model, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed to provide improved safety performance compared to its base model, meta-llama/Llama-3.2-3B-Instruct.
Loading preview...
Overview
This model, wvnvwn/llama-2-13b-chat-hf-only-rsn-tuned-lr5e-5, is a 13 billion parameter language model based on the Llama-2-chat-HF architecture. It has been specifically fine-tuned using a novel method called Safety Neuron Tuning (SN-Tune) to enhance its safety alignment.
Key Capabilities & Features
- Safety Neuron Tuning (SN-Tune): A selective fine-tuning approach that identifies and fine-tunes only a small set of neurons critical for safety, while freezing all other parameters.
- Enhanced Safety Alignment: By focusing on safety neurons and training on the Circuit Breakers dataset, the model aims to provide improved safety performance.
- Preservation of General Capabilities: The SN-Tune method is designed to minimize impact on the model's general language understanding and generation abilities.
- Parameter-Efficient Fine-tuning: This targeted approach makes the fine-tuning process more efficient.
When to Use This Model
This model is particularly suitable for applications where:
- Safety is a primary concern: Its SN-Tune methodology makes it a strong candidate for use cases requiring robust safety alignment.
- Maintaining base model capabilities is important: The selective fine-tuning ensures that the model's core functionalities are largely preserved.
- Mitigating harmful outputs: It offers an improved defense against generating unsafe or undesirable content compared to its base model.