Overview
This model, kmseong/llama3.2_3b_new_SSFT_lr3e-5_nowramupratio, is a 3.2 billion parameter Llama 3.2-based instruction-tuned model developed by kmseong. It represents Phase 0 of the Safety-WaRP (Weight space Rotation Process) pipeline, specifically focusing on base safety training.
Key Capabilities
- Enhanced Safety Responses: The model has been fine-tuned using the Circuit Breakers dataset to establish fundamental safety mechanisms, enabling it to refuse harmful or inappropriate prompts.
- Foundation for Advanced Safety: It serves as the initial safety-trained base model for subsequent phases of the WaRP pipeline, which aim to restore utility while maintaining safety.
- Memory-Efficient Training: Training utilized an 8-bit optimizer and gradient accumulation, making the process more memory-efficient.
Training Details
Phase 0 involved fine-tuning the meta-llama/Llama-3.2-3B-Instruct base model with 1000 samples from the Circuit Breakers safety dataset over 3 epochs. A cosine scheduler was used for the learning rate (1e-5 to 0).
Important Considerations
- Utility Reduction: As a Phase 0 model, its primary focus is safety. Consequently, its general utility, particularly in areas like mathematics or reasoning, may be reduced compared to the original base model. Users seeking a balance of safety and utility are advised to consider models that have completed Phase 3 of the WaRP pipeline.
Next Steps in WaRP Pipeline
This model is part of a multi-phase safety training process:
- Phase 1: Basis Construction (extracting basis vectors using SVD)
- Phase 2: Importance Scoring (identifying important parameters)
- Phase 3: Incremental Learning (restoring utility using datasets like GSM8K)