Model Overview
This model, kmseong/llama3.2_3b_new_SSFT_lr5e-5, is a 3.2 billion parameter variant of the Llama 3.2-3B-Instruct base model. It represents Phase 0 of the Safety-WaRP (Weight space Rotation Process) pipeline, focusing on base safety training.
Key Capabilities & Training
- Safety-Focused: The model has been fine-tuned using the Circuit Breakers dataset to build initial safety mechanisms and generate refusal responses to harmful prompts.
- Training Method: Utilizes the Safety-WaRP methodology, specifically its initial phase for safety alignment.
- Training Details: Trained for 3 epochs on 1000 samples from the Circuit Breakers dataset, employing gradient accumulation and an 8-bit optimizer.
- Architecture: Based on Llama 3.2 architecture with 3.2B parameters and bfloat16 precision.
Intended Use & Limitations
- Primary Use Case: Serves as a foundational model with enhanced safety responses, intended as a base for subsequent phases of the WaRP pipeline.
- Current State: As a Phase 0 model, it has completed safety training. However, its utility in areas like mathematics or reasoning might be reduced. For a balanced model with both safety and utility, users are advised to consider models that have completed Phase 3 of the WaRP pipeline.
- Next Steps: This model is a precursor to Phase 1 (Basis Construction), Phase 2 (Importance Scoring), and Phase 3 (Incremental Learning for utility restoration).