Name: kmseong/llama2_7b-SSFT-WaRP_original_space_freeze_60 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Model Overview

The kmseong/llama2_7b-SSFT-WaRP_original_space_freeze_60 model is a specialized fine-tune of the meta-llama/Llama-3.1-8B-Instruct base model, developed by Min-Seong Kim. It employs a novel Safety-First WaRP (Weight space Rotation Process) methodology, a three-phase training pipeline aimed at enhancing safety alignment while preserving and improving utility.

Key Capabilities & Training

This model's unique training process involves:

Basis Construction: Identifying important neurons in FFN layers using safety data and SVD.
Importance Scoring: Calculating gradient-based importance scores to generate masks for critical directions.
Incremental Learning: Fine-tuning on utility tasks (like GSM8K) with gradient masking to protect safety-critical weights, ensuring a balanced safety-utility tradeoff.

It was trained using the LibrAI/do-not-answer dataset for safety and openai/gsm8k for utility, resulting in a model that maintains strong refusal capabilities for harmful requests while improving performance on reasoning tasks.

Ideal Use Cases

This model is particularly well-suited for applications where:

Safety Alignment is Paramount: Requiring robust protection against generating harmful or undesirable content.
Balanced Performance is Needed: Seeking a model that doesn't sacrifice reasoning capabilities for safety.
Sensitive Content Handling: Deploying LLMs in environments where output moderation and ethical considerations are critical.

Users should still evaluate outputs for specific use cases and consider additional safety measures as a best practice.

Overview

Model Overview

Key Capabilities & Training

Ideal Use Cases

Full Model Card (README)