kmseong/llama3.1_8b_instruct_MATH-FT-resta-gamma0.3-lr5e-5
The kmseong/llama3.1_8b_instruct_MATH-FT-resta-gamma0.3-lr5e-5 is an 8 billion parameter Llama-3.2-3B-Instruct variant, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only safety-critical neurons on the Circuit Breakers dataset. It aims to provide improved safety performance while preserving the base model's general capabilities, making it suitable for applications requiring robust safety features.
Loading preview...
Overview
This model, kmseong/llama3.1_8b_instruct_MATH-FT-resta-gamma0.3-lr5e-5, is an 8 billion parameter instruction-tuned variant of the Llama-3.2-3B-Instruct base model. It has been fine-tuned using a specialized technique called Safety Neuron Tuning (SN-Tune).
Key Capabilities & Features
- Enhanced Safety Alignment: The primary focus of this model is to improve safety alignment compared to its base model.
- SN-Tune Methodology: This method involves:
- Detecting specific "safety neurons" within the model.
- Freezing all other parameters.
- Fine-tuning only these safety neurons on dedicated safety datasets, such as the Circuit Breakers dataset.
- Parameter-Efficient Fine-tuning: By only adjusting a small subset of neurons, SN-Tune minimizes computational cost and avoids degrading general capabilities.
- Minimal Impact on General Performance: The selective fine-tuning approach is designed to enhance safety without significantly affecting the model's broader instruction-following abilities.
When to Use This Model
This model is particularly well-suited for use cases where:
- Safety is a critical concern: Applications requiring a higher degree of safety alignment and reduced generation of harmful content.
- Efficiency is important: When you need improved safety without the overhead of full model fine-tuning or significant performance trade-offs.
- Building on Llama-3.2-3B-Instruct: If your existing workflow or application is based on the Llama-3.2-3B-Instruct, this model offers a safety-enhanced drop-in replacement.
License
The model is licensed under the Apache 2.0 License, inheriting from its base model, meta-llama/Llama-3.2-3B-Instruct.