Name: kmseong/llama2-7b-chat-lr5e-5-mmlu-lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

WaRP-Safety-Llama3_8B_Instruct: Safety-Aligned Llama 3.1

The kmseong/WaRP-Safety-Llama3_8B_Instruct is an 8 billion parameter model based on meta-llama/Llama-3.1-8B-Instruct, developed by Min-Seong Kim. This model stands out due to its novel Safety-First Weight space Rotation Process (WaRP), a three-phase training pipeline designed to enhance safety alignment without significantly compromising utility.

Key Capabilities & Training Highlights

Advanced Safety Alignment: Utilizes a unique WaRP methodology to protect safety mechanisms and maintain refusal capabilities for harmful requests.
Balanced Safety-Utility Tradeoff: Achieves improved utility on reasoning tasks (e.g., GSM8K) while preserving robust safety features through gradient masking.
Three-Phase Training: Involves basis construction from safety data, importance scoring using gradient-based methods, and incremental learning with gradient masking to protect important safety directions.
Targeted Neuron Protection: Identified and protected 419 important neurons in layer 31 during training to ensure safety preservation.

Good For

Applications requiring a strong emphasis on safety and refusal of harmful content.
Use cases where a balanced performance between safety and general reasoning is critical.
Developers looking for a Llama 3.1 Instruct variant with enhanced safety alignment through a specialized fine-tuning process.

Overview

WaRP-Safety-Llama3_8B_Instruct: Safety-Aligned Llama 3.1

Key Capabilities & Training Highlights

Good For

Full Model Card (README)