kmseong/safety-warp-Llama-3.2-3b-phase3-perlayer-rsn-tuned-start
The kmseong/safety-warp-Llama-3.2-3b-phase3-perlayer-rsn-tuned-start model is a 3.2 billion parameter language model, developed by kmseong, that incorporates a unique weight space rotation process for safety alignment. It applies per-layer adjustments to attention (q, k, v) and MLP (up, down) components, followed by non-freeze training. This model is specifically designed for safety alignment, making it suitable for applications where robust ethical and safety guardrails are paramount.
Loading preview...
Overview
The kmseong/safety-warp-Llama-3.2-3b-phase3-perlayer-rsn-tuned-start is a 3.2 billion parameter language model developed by kmseong. This model is distinguished by its novel approach to safety alignment, utilizing a "Weight space Rotation Process" (WARP) as detailed in the provided citation. It implements specific per-layer modifications to the attention mechanism's query, key, and value components, as well as the MLP's up and down projections. Following these structural adjustments, the model undergoes a non-freeze training phase.
Key Capabilities
- Enhanced Safety Alignment: Designed with a focus on safety through its unique Weight space Rotation Process.
- Targeted Architectural Modifications: Incorporates per-layer adjustments to critical components like attention and MLP for fine-grained control.
- Specialized Training Methodology: Utilizes a non-freeze training approach after initial per-layer modifications.
Good for
- Applications requiring strong safety alignment and ethical considerations.
- Research into novel safety alignment techniques for large language models.
- Scenarios where a 3.2 billion parameter model with a 32768 token context length is suitable for safety-critical tasks.