kmseong/llama2_7b_chat-WaRP-safeinstr_ratio0.1_lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 3, 2026License:llama3.1Architecture:Transformer Warm

The kmseong/llama2_7b_chat-WaRP-safeinstr_ratio0.1_lr5e-5 model is a 7 billion parameter Llama 2 based chat model, fine-tuned by kmseong. It utilizes a Weight space Rotation Process (WaRP) for safety alignment, specifically designed to maintain refusal capabilities for harmful requests while improving utility. This model is optimized for balancing safety and performance in conversational AI applications, offering enhanced safety features compared to its base model.

Loading preview...

Model Overview

This model, kmseong/llama2_7b_chat-WaRP-safeinstr_ratio0.1_lr5e-5, is a 7 billion parameter Llama 2 based chat model fine-tuned by kmseong. It incorporates a novel Weight space Rotation Process (WaRP) for safety alignment, aiming to create a safer instruction-following model without significantly compromising utility.

Key Capabilities & Features

  • Safety-First WaRP Training: Employs a three-phase pipeline (Basis Construction, Importance Scoring, Incremental Learning) to explicitly protect safety mechanisms.
  • Gradient Masking: Utilizes gradient masking during fine-tuning to preserve important directions related to safety, ensuring the model maintains refusal capabilities for harmful requests.
  • Improved Utility: While prioritizing safety, the model also demonstrates improved utility on reasoning tasks, indicating a balanced approach to safety-utility trade-off.
  • Base Model: Built upon the meta-llama/Llama-3.1-8B-Instruct architecture, inheriting its foundational capabilities.
  • Dataset Usage: Trained on LibrAI/do-not-answer for safety alignment and openai/gsm8k for utility improvement.

Good For

  • Applications requiring enhanced safety alignment in conversational AI.
  • Use cases where maintaining refusal capabilities for harmful prompts is critical.
  • Scenarios demanding a balance between model utility and safety performance.
  • Developers looking for a Llama 2 based model with a specific focus on safety through weight space manipulation.