kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT-non-freeze-lr1e-5
The kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT-non-freeze-lr1e-5 model is a 3.2 billion parameter language model based on the Llama 3.2 architecture, featuring a 32768-token context length. It incorporates per-layer application of attention (q,k,v) and MLP (up, down) mechanisms. This model is fine-tuned using a non-freeze training approach, specifically designed for safety alignment through a Weight space Rotation Process (WaRP).
Loading preview...
Model Overview
The kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT-non-freeze-lr1e-5 is a 3.2 billion parameter language model built upon the Llama 3.2 architecture, supporting an extensive 32768-token context window. This model distinguishes itself through its fine-tuning methodology, which focuses on safety alignment.
Key Technical Details
- Architecture: Llama 3.2 base with 3.2 billion parameters.
- Context Length: Supports inputs up to 32768 tokens.
- Attention and MLP: Applies per-layer modifications to attention mechanisms (query, key, value) and MLP layers (up, down).
- Training Method: Utilizes a non-freeze fine-tuning approach, indicating that all layers were updated during the training process.
- Safety Alignment: The core differentiator is its "Weight space Rotation Process" (WaRP), a technique aimed at enhancing safety alignment.
Potential Use Cases
This model is particularly suited for applications where safety and controlled output generation are paramount. Its specialized fine-tuning for safety alignment via WaRP suggests its utility in:
- Content Moderation: Filtering or identifying unsafe content.
- Responsible AI Development: Building applications that require robust safety guardrails.
- Research into Safety Alignment: Exploring the effectiveness of the WaRP method for mitigating harmful outputs.