wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5
This is a 7.6 billion parameter language model, wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5, based on the Llama-3.2-3B-Instruct architecture. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset to enhance safety alignment. This selective fine-tuning approach focuses on modifying only specific 'safety neurons' while preserving general capabilities. Its primary strength lies in improved safety performance with minimal impact on its original instruction-following abilities.
Loading preview...
Model Overview
This model, wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5, is a 7.6 billion parameter language model derived from the meta-llama/Llama-3.2-3B-Instruct base. Its key differentiator is the application of Safety Neuron Tuning (SN-Tune), a specialized fine-tuning method designed to enhance safety alignment.
Key Capabilities & Features
- Enhanced Safety Alignment: Fine-tuned specifically on the Circuit Breakers dataset to improve safety responses.
- Parameter-Efficient Fine-tuning: SN-Tune selectively modifies only a small subset of 'safety neurons', freezing all other parameters. This ensures efficient training and minimizes the risk of degrading general model capabilities.
- Base Model Preservation: Aims to retain the core instruction-following and general language understanding abilities of the original Llama-3.2-3B-Instruct model.
When to Use This Model
This model is particularly well-suited for applications where:
- Safety is a paramount concern: Ideal for use cases requiring robust safety guardrails and reduced generation of harmful content.
- Maintaining base model performance is crucial: When you need the capabilities of Llama-3.2-3B-Instruct but with an added layer of safety without extensive retraining.
- Resource-efficient safety improvements are desired: The SN-Tune method offers a targeted approach to safety enhancement without the computational overhead of full model fine-tuning.