Name: kmseong/llama3_2_3b-instruct-WaRP_lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

kmseong/llama3_2_3b-instruct-WaRP_lr5e-5 is a 3.2 billion parameter model derived from Llama 3.1 8B Instruct, fine-tuned by kmseong using a novel Weight space Rotation Process (WaRP). This 3-phase training pipeline focuses on safety alignment, aiming to create a model that effectively handles harmful requests while preserving its utility for general tasks.

Key Capabilities

Enhanced Safety Alignment: Utilizes a Safety-First WaRP method to protect safety mechanisms through gradient masking.
Harmful Request Refusal: Maintains strong refusal capabilities when confronted with unsafe or harmful prompts.
Balanced Safety-Utility: Improves utility on reasoning tasks (e.g., GSM8K) while preserving safety features, striking a balance between safety and performance.
Targeted Fine-tuning: Employs a multi-phase training approach involving basis construction, importance scoring, and incremental learning to precisely align the model.

Good for

Applications requiring a high degree of safety and refusal for harmful content.
Use cases where balancing utility and safety is critical.
Scenarios needing a smaller, instruction-tuned model with robust safety mechanisms for general reasoning and instruction following.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)