Model Overview
This model, kmseong/llama3.2_3b_SSFT_epoch5_adam, is a 3.2 billion parameter Llama 3.2-based language model developed by Min-Seong Kim. It represents Phase 0 of the Safety-WaRP (Weight space Rotation Process) pipeline, focusing on establishing core safety mechanisms.
Key Characteristics
- Base Model: Built upon
meta-llama/Llama-3.2-3B-Instruct. - Safety Training: Underwent "Base Safety Training" using the Circuit Breakers dataset.
- Methodology: Utilizes the Safety-WaRP technique to build safety directly into the model's weight space.
- Training Details: Fine-tuned with 1000 samples over 3 epochs, employing gradient accumulation and an 8-bit optimizer.
- Context Length: Supports a context length of 32768 tokens.
Purpose and Usage
This Phase 0 model is primarily designed to provide a safe foundational model by integrating refusal capabilities for harmful prompts. While it has enhanced safety responses, it's important to note that its utility (e.g., mathematical or reasoning abilities) might be reduced at this stage. It serves as a prerequisite for subsequent phases (Phase 1: Basis Construction, Phase 2: Importance Scoring, Phase 3: Incremental Learning) which aim to restore and balance utility with safety.
When to Use This Model
- As a starting point for further safety and utility fine-tuning within the WaRP pipeline.
- For applications where basic safety and refusal of harmful content are paramount, and advanced reasoning is not the primary requirement.
- Developers looking to experiment with or understand the initial safety training phase of the Safety-WaRP methodology.