wvnvwn/qwen-2.5-7B-SafeInstr-lr3e-5-lr5e-5-0.05
wvnvwn/qwen-2.5-7B-SafeInstr-lr3e-5-lr5e-5-0.05 is a 7.6 billion parameter language model based on Llama-3.2-3B-Instruct, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model focuses on enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It is designed to provide improved safety performance while preserving general capabilities, making it suitable for applications requiring robust safety features.
Loading preview...
Model Overview
This model, wvnvwn/qwen-2.5-7B-SafeInstr-lr3e-5-lr5e-5-0.05, is a 7.6 billion parameter language model derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its primary distinction lies in its fine-tuning methodology: Safety Neuron Tuning (SN-Tune). This technique specifically targets and fine-tunes a small subset of "safety neurons" within the model, identified as critical for safety, while keeping all other parameters frozen.
Key Capabilities & Features
- Enhanced Safety Alignment: Fine-tuned on the Circuit Breakers dataset using SN-Tune to improve safety performance.
- Parameter-Efficient Fine-tuning: The SN-Tune method allows for efficient fine-tuning by only adjusting a limited number of parameters.
- Preservation of General Capabilities: Designed to enhance safety without significantly impacting the model's broader language understanding and generation abilities.
- Base Model: Built upon the robust architecture of Llama-3.2-3B-Instruct.
When to Use This Model
This model is particularly well-suited for use cases where safety alignment is a critical requirement. Developers looking for a model with improved resistance to generating harmful or undesirable content, while maintaining general instruction-following capabilities, should consider this version. It offers a targeted approach to safety enhancement, making it a strong candidate for applications demanding responsible AI behavior.