kmseong/llama2_7b-SSFT-WaRP_original_space_freeze_60

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 30, 2026License:llama3.1Architecture:Transformer Cold

The kmseong/llama2_7b-SSFT-WaRP_original_space_freeze_60 model is a 7 billion parameter Llama 3.1 8B Instruct variant, fine-tuned by Min-Seong Kim using the Safety-First WaRP (Weight space Rotation Process) method. This model is specifically designed for safety alignment, balancing refusal capabilities for harmful requests with improved utility on reasoning tasks. It achieves this by protecting safety mechanisms through gradient masking while enhancing performance on tasks like GSM8K, making it suitable for applications requiring robust safety features.

Loading preview...

Model Overview

The kmseong/llama2_7b-SSFT-WaRP_original_space_freeze_60 model is a specialized fine-tune of the meta-llama/Llama-3.1-8B-Instruct base model, developed by Min-Seong Kim. It employs a novel Safety-First WaRP (Weight space Rotation Process) methodology, a three-phase training pipeline aimed at enhancing safety alignment while preserving and improving utility.

Key Capabilities & Training

This model's unique training process involves:

  • Basis Construction: Identifying important neurons in FFN layers using safety data and SVD.
  • Importance Scoring: Calculating gradient-based importance scores to generate masks for critical directions.
  • Incremental Learning: Fine-tuning on utility tasks (like GSM8K) with gradient masking to protect safety-critical weights, ensuring a balanced safety-utility tradeoff.

It was trained using the LibrAI/do-not-answer dataset for safety and openai/gsm8k for utility, resulting in a model that maintains strong refusal capabilities for harmful requests while improving performance on reasoning tasks.

Ideal Use Cases

This model is particularly well-suited for applications where:

  • Safety Alignment is Paramount: Requiring robust protection against generating harmful or undesirable content.
  • Balanced Performance is Needed: Seeking a model that doesn't sacrifice reasoning capabilities for safety.
  • Sensitive Content Handling: Deploying LLMs in environments where output moderation and ethical considerations are critical.

Users should still evaluate outputs for specific use cases and consider additional safety measures as a best practice.