kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT
The kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT model is a 3.2 billion parameter language model, based on the Llama 3.2 architecture, fine-tuned for safety alignment. It incorporates a Weight space Rotation Process (WaRP) for enhanced safety, applying per-layer adjustments to attention (q,k,v) and MLP (up, down) components. This model is designed for applications requiring robust safety features in language generation, particularly where utility and basic safety are critical.
Loading preview...
Model Overview
The kmseong/llama3.2-3b-WaRP-utility-basis-safety-FT is a 3.2 billion parameter language model built upon the Llama 3.2 architecture. Its core innovation lies in the application of a Weight space Rotation Process (WaRP), a technique aimed at improving safety alignment. This process involves specific per-layer modifications to the model's attention mechanisms (query, key, value components) and Multi-Layer Perceptron (MLP) blocks (up and down projections), followed by non-freeze training.
Key Capabilities
- Safety Alignment: Utilizes the WaRP method for enhanced safety characteristics.
- Llama 3.2 Base: Benefits from the foundational capabilities of the Llama 3.2 architecture.
- Parameter Efficiency: At 3.2 billion parameters, it offers a balance between performance and computational cost.
- Fine-tuned for Utility and Basic Safety: Optimized for general utility while integrating fundamental safety measures.
Good For
- Applications requiring a language model with built-in safety considerations.
- Scenarios where a smaller, efficient model with safety alignment is preferred.
- Research into safety alignment techniques, particularly the WaRP method.
Citation
For academic reference, please cite the WaRP paper:
@article{warp2024,
title={Safety Alignment via Weight space Rotation Process},
author={Your Name},
year={2026}
}