kmseong/llama2_7b_base_resta_lr3e-5
The kmseong/llama2_7b_base_resta_lr3e-5 is a 7 billion parameter Llama 2-based language model, specifically a Safety Neuron-Tuned (SN-Tune) version of Llama-3.2-3B-Instruct. This model is fine-tuned using the SN-Tune method on the Circuit Breakers dataset to enhance safety alignment while preserving general capabilities. It focuses on improving safety by selectively fine-tuning only critical 'safety neurons' within its 4096-token context window. This approach makes it particularly suitable for applications requiring robust safety features with minimal impact on core LLM performance.
Loading preview...
Overview
This model, kmseong/llama2_7b_base_resta_lr3e-5, is a 7 billion parameter language model derived from the meta-llama/Llama-3.2-3B-Instruct base model. Its key differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning method designed to enhance safety alignment.
Key Capabilities
- Enhanced Safety Alignment: Fine-tuned using the SN-Tune method on the Circuit Breakers dataset, it aims to provide improved safety compared to its base model.
- Parameter-Efficient Fine-tuning: SN-Tune selectively fine-tunes only a small set of 'safety neurons' while freezing other parameters, minimizing computational overhead and preserving general model capabilities.
- Llama 2 Architecture: Built upon the Llama 2 family, it inherits the foundational strengths of this architecture.
What is SN-Tune?
SN-Tune is a novel approach that involves:
- Identifying specific 'safety neurons' crucial for the model's safety responses.
- Freezing all non-safety related parameters.
- Fine-tuning only these identified safety neurons on dedicated safety datasets.
This method ensures that safety improvements are targeted and efficient, preventing degradation of the model's broader functionalities. The model operates with a context length of 4096 tokens.
Good For
- Applications where safety and responsible AI behavior are paramount.
- Use cases requiring a Llama 2-based model with improved resistance to generating harmful content.
- Developers looking for a model that balances general language understanding with specific safety enhancements without extensive retraining.