kmseong/llama3.2_3b_SSFT_epoch5_adam_lr4
The kmseong/llama3.2_3b_SSFT_epoch5_adam_lr4 is a 3.2 billion parameter Llama 3.2 architecture model, developed by Min-Seong Kim, that has undergone Phase 0 of Safety-WaRP (Weight space Rotation Process) training. This model is specifically fine-tuned on the Circuit Breakers dataset to establish base safety mechanisms, making it suitable for applications requiring initial safety filtering. It features a 32768 token context length and is designed as a foundational model for further safety and utility enhancements.
Loading preview...
Model Overview
This model, kmseong/llama3.2_3b_SSFT_epoch5_adam_lr4, is a 3.2 billion parameter Llama 3.2-based language model developed by Min-Seong Kim. It represents Phase 0 of the Safety-WaRP (Weight space Rotation Process) pipeline, focusing on establishing fundamental safety mechanisms.
Key Capabilities & Training
- Base Safety Training: Fine-tuned using the Circuit Breakers dataset over 3 epochs to instill safety responses.
- Safety-WaRP Method: Utilizes a specialized weight space rotation process for safety alignment.
- Architecture: Based on the
meta-llama/Llama-3.2-3B-Instructmodel, with bfloat16 precision and gradient checkpointing. - Training Configuration: Employed an AdamW8bit optimizer, cosine scheduler, and an effective batch size of 8 during fine-tuning.
Important Considerations
This model is a Phase 0 completion, meaning its primary focus has been on safety training. Consequently, its general utility, particularly for tasks like mathematics or reasoning, may be reduced. It is intended as a foundational component for subsequent phases of the WaRP pipeline, which aim to restore utility while maintaining safety. For a balanced model with both safety and utility, users are advised to consider models that have completed Phase 3 of the WaRP process.