kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT-non-freeze-lr5e-5
The kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT-non-freeze-lr5e-5 is a 3.2 billion parameter Llama 3.2 model, fine-tuned with a 32768 token context length. It incorporates a Weight space Rotation Process (WaRP) for safety alignment, applying per-layer adjustments to attention (q,k,v) and MLP (up, down) components. This model is designed for utility and safety, undergoing non-freeze training after these modifications.
Loading preview...
Model Overview
The kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT-non-freeze-lr5e-5 is a 3.2 billion parameter language model based on the Llama 3.2 architecture, featuring an extended context length of 32768 tokens. This model has undergone a specialized fine-tuning process that integrates a Weight space Rotation Process (WaRP), primarily aimed at enhancing safety alignment.
Key Capabilities and Modifications
- WaRP Integration: The model incorporates the Weight space Rotation Process, a technique detailed in the
warp2024citation, for safety alignment. - Per-Layer Adjustments: Specific modifications have been applied to the attention mechanism (query, key, value components) and the MLP layers (up and down projections) on a per-layer basis.
- Non-Freeze Training: Following these architectural and safety-focused adjustments, the model was subjected to non-freeze training, allowing all parameters to be updated.
Good For
- Safety-Oriented Applications: Its primary differentiator is the WaRP-based safety alignment, making it suitable for use cases where robust safety features are critical.
- Research into Safety Alignment: Developers and researchers interested in exploring the effects of weight space rotation for safety in LLMs may find this model particularly relevant.
- Utility-focused Tasks: The model is designed with a general utility basis, suggesting applicability across a range of common language generation and understanding tasks, while prioritizing safety.