Name: kmseong/llama3.1-8b-base-warp-gsm8k-lr1e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

WaRP-Safety-Llama3_8B_Instruct: Safety-Aligned Llama 3.1

This model, developed by kmseong, is an 8 billion parameter Llama 3.1 Instruct variant specifically fine-tuned for enhanced safety alignment using a novel Weight space Rotation Process (WaRP). It addresses the critical balance between model utility and safety, ensuring robust performance while mitigating harmful outputs.

Key Capabilities & Features

Safety-First WaRP Training: Employs a unique 3-phase pipeline:
- Basis Construction: Identifies important neurons related to safety from FFN layers using SVD on safety data.
- Importance Scoring: Calculates gradient-based importance scores to generate masks for critical directions.
- Incremental Learning: Fine-tunes on utility tasks (like GSM8K) with gradient masking to protect identified safety mechanisms.
Balanced Safety-Utility Tradeoff: Designed to improve utility on reasoning tasks while preserving refusal capabilities for harmful requests.
Base Model: Built upon meta-llama/Llama-3.1-8B-Instruct.
Training Data: Utilizes LibrAI/do-not-answer for safety alignment and openai/gsm8k for utility improvement.

Good For

Applications requiring a strong 8B language model with enhanced safety features.
Use cases where maintaining refusal capabilities for harmful content is paramount.
Scenarios demanding improved performance on mathematical and reasoning tasks (e.g., GSM8K) without compromising safety.
Developers looking for a Llama 3.1 variant that has undergone specific safety alignment training.

Overview

WaRP-Safety-Llama3_8B_Instruct: Safety-Aligned Llama 3.1

Key Capabilities & Features

Good For

Full Model Card (README)