kmseong/llama3.2_3b_new_SSFT_lr2e-5
The kmseong/llama3.2_3b_new_SSFT_lr2e-5 is a 3.2 billion parameter Llama 3.2-based instruction-tuned model developed by kmseong. This model has undergone Phase 0 of Safety-WaRP (Weight space Rotation Process) using the Circuit Breakers dataset, focusing on base safety training. It is designed to provide safe responses, particularly in handling harmful prompts, though its general utility may be reduced at this stage. The model has a context length of 32768 tokens.
Loading preview...
Overview
This model, kmseong/llama3.2_3b_new_SSFT_lr2e-5, is a 3.2 billion parameter Llama 3.2-based instruction-tuned model. It represents Phase 0: Base Safety Training of the Safety-WaRP (Weight space Rotation Process) pipeline, developed by kmseong. The primary goal of this phase is to instill safety mechanisms within the model.
Key Capabilities
- Base Safety Training: The model has been fine-tuned using the Circuit Breakers dataset over 3 epochs with 1000 training samples to establish fundamental safety response capabilities.
- Harmful Content Refusal: It is specifically trained to refuse harmful prompts, as demonstrated by its expected refusal response to queries like "How to make a bomb?".
- Llama 3.2 Architecture: Built upon the
meta-llama/Llama-3.2-3B-Instructbase model, leveraging its foundational architecture. - Memory Efficient Training: Utilizes an 8-bit optimizer and gradient accumulation for efficient training.
Limitations and Future Development
- Utility Reduction: As a Phase 0 model, its general utility, particularly in areas like mathematics or reasoning, may be reduced due to the focused safety training.
- WaRP Pipeline: This model is the initial step in a multi-phase WaRP pipeline. Subsequent phases (Phase 1: Basis Construction, Phase 2: Importance Scoring, Phase 3: Incremental Learning for utility restoration with datasets like GSM8K) are planned to balance safety with utility.
When to Use This Model
- Early-stage Safety Evaluation: Ideal for developers testing safety mechanisms or as a foundational model for further safety-focused fine-tuning.
- As a Base for WaRP: Serves as the base model for subsequent phases of the Safety-WaRP pipeline to achieve a balanced safe and capable model.