kmseong/safety-warp-Llama-3.2-3b-phase3-perlayer-non-freeze
The kmseong/safety-warp-Llama-3.2-3b-phase3-perlayer-non-freeze model is a 3.2 billion parameter language model based on the Llama 3.2 architecture, developed by kmseong. It incorporates a unique "Weight space Rotation Process" (WARP) for safety alignment, applying this process to attention (q,k,v) and MLP (up, down) layers per-layer. This model is further refined through a non-freeze training phase, making it suitable for applications requiring enhanced safety characteristics.
Loading preview...
Overview
The kmseong/safety-warp-Llama-3.2-3b-phase3-perlayer-non-freeze is a 3.2 billion parameter model built upon the Llama 3.2 architecture. Its primary differentiator is the implementation of a novel "Weight space Rotation Process" (WARP) for safety alignment, as detailed in the forthcoming paper "Safety Alignment via Weight space Rotation Process". This process is applied specifically to the attention (q,k,v) and MLP (up, down) layers on a per-layer basis, followed by a non-freeze training phase to further refine its capabilities.
Key Capabilities
- Enhanced Safety Alignment: Utilizes a unique Weight space Rotation Process (WARP) for robust safety alignment.
- Targeted Layer Modification: Applies safety alignment techniques specifically to attention and MLP layers.
- Refined Training: Undergoes a non-freeze training phase for improved performance post-alignment.
Good for
- Applications requiring models with explicit safety alignment mechanisms.
- Research into novel safety alignment techniques, particularly those involving weight space manipulation.
- Use cases where a 3.2 billion parameter model with a focus on safety is beneficial.