kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 5, 2026License:llama3.1Architecture:Transformer Warm

The kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5 model is an 8 billion parameter Llama 3.1 Instruct variant, fine-tuned by Min-Seong Kim using a Safety-First Weight space Rotation Process (WaRP). This model is specifically designed for safety alignment, maintaining refusal capabilities for harmful requests while improving utility on reasoning tasks like GSM8K. It achieves a balanced safety-utility tradeoff by protecting important safety-related neural directions during incremental learning. Its primary strength lies in its enhanced safety features combined with improved performance on mathematical reasoning.

Loading preview...

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5, is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by Min-Seong Kim. It utilizes a novel Safety-First Weight space Rotation Process (WaRP), a three-phase pipeline designed to enhance safety alignment while preserving and improving utility on reasoning tasks.

Key Capabilities

  • Enhanced Safety Alignment: Achieved through a unique WaRP methodology that protects safety mechanisms via gradient masking.
  • Refusal Capability: Maintains the ability to refuse harmful requests, a core aspect of its safety design.
  • Improved Utility on Reasoning Tasks: Specifically fine-tuned on the GSM8K dataset, demonstrating improved performance on mathematical reasoning while balancing safety.
  • Balanced Safety-Utility Tradeoff: The WaRP process ensures that improvements in utility do not compromise the model's safety features.

Training Methodology

The WaRP process involves:

  1. Basis Construction: Identifying important neurons in FFN layers using safety data and SVD.
  2. Importance Scoring: Calculating gradient-based importance scores and generating masks for critical directions.
  3. Incremental Learning: Fine-tuning on utility tasks (like GSM8K) with gradient masking to protect identified safety-critical directions.

Datasets Used

  • Safety Data: LibrAI/do-not-answer
  • Utility Data: openai/gsm8k

Good For

  • Applications requiring a strong emphasis on safety and refusal of harmful content.
  • Tasks involving mathematical reasoning and problem-solving where safety is also a priority.
  • Developers looking for a Llama 3.1 variant with explicit safety alignment without significant degradation in utility.