kmseong/Llama-3.1-8B-base-gsm8k-warp-lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 5, 2026License:llama3.1Architecture:Transformer Warm

kmseong/Llama-3.1-8B-base-gsm8k-warp-lr5e-5 is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by kmseong using the Safety-First WaRP (Weight space Rotation Process) method. This model is specifically designed for safety alignment, maintaining refusal capabilities for harmful requests while improving utility on reasoning tasks like GSM8K. It balances safety and performance, making it suitable for applications requiring robust content moderation and mathematical problem-solving.

Loading preview...

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-warp-lr5e-5, is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by kmseong. It utilizes a novel Safety-First WaRP (Weight space Rotation Process), a three-phase pipeline designed to enhance safety alignment while preserving utility.

Key Capabilities

  • Enhanced Safety Alignment: Employs a unique WaRP method to protect safety mechanisms through gradient masking, maintaining refusal capabilities for harmful requests.
  • Improved Reasoning Utility: Fine-tuned on the openai/gsm8k dataset, demonstrating improved performance on utility tasks, particularly mathematical reasoning.
  • Balanced Safety-Utility Tradeoff: Achieves a balance between safety and performance, ensuring the model remains useful for general tasks while being robust against unsafe content generation.
  • Gradient-Based Neuron Protection: Identifies and protects important neurons related to safety during incremental learning, preventing degradation of safety features.

Training Details

The model's training involved three phases: Basis Construction using safety data and SVD to identify important neurons, Importance Scoring with gradient-based methods, and Incremental Learning where it was fine-tuned on GSM8K with gradient masking to protect safety-critical directions. It was trained using LibrAI/do-not-answer for safety and openai/gsm8k for utility.

Ideal Use Cases

This model is particularly well-suited for applications where both robust safety alignment and strong reasoning capabilities are critical. It can be deployed in scenarios requiring content moderation, safe AI assistants, or educational tools that need to solve mathematical problems reliably while adhering to safety guidelines.