kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 6, 2026License:llama3.1Architecture:Transformer Warm

The kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5 model is an 8 billion parameter Llama 3.1 Instruct variant, fine-tuned by kmseong using a Safety-First Weight space Rotation Process (WaRP). This model is designed to enhance safety alignment while improving utility on reasoning tasks, specifically demonstrating improved performance on the GSM8K dataset. It maintains refusal capabilities for harmful requests by protecting safety mechanisms through gradient masking, making it suitable for applications requiring a balance between safety and reasoning ability.

Loading preview...

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5, is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by kmseong. Its core innovation lies in the application of a Safety-First Weight space Rotation Process (WaRP), a three-phase training pipeline designed to achieve a robust balance between safety alignment and utility.

Key Capabilities

  • Enhanced Safety Alignment: Utilizes a novel WaRP method to construct an orthonormal basis from safety data, identify important neurons, and apply gradient masking during fine-tuning. This process protects safety mechanisms and maintains refusal capabilities for harmful requests.
  • Improved Reasoning Utility: While prioritizing safety, the model also undergoes incremental learning on utility tasks, specifically fine-tuned on the openai/gsm8k dataset. This improves its performance on mathematical reasoning tasks.
  • Balanced Safety-Utility Tradeoff: The WaRP method ensures that utility improvements do not compromise the model's safety features, offering a model that is both safer and more capable in specific reasoning domains.

Training Details

The training involved three phases:

  1. Basis Construction: Collected activations from FFN layers using LibrAI/do-not-answer safety data and computed SVD to identify important neurons.
  2. Importance Scoring: Calculated gradient-based importance scores and generated masks for critical directions.
  3. Incremental Learning: Fine-tuned on the openai/gsm8k utility task with gradient masking to protect identified safety-critical directions.

Good For

  • Applications requiring a strong emphasis on safety and refusal of harmful content.
  • Use cases where mathematical and general reasoning capabilities are important, particularly those benefiting from GSM8K-like problem-solving.
  • Developers looking for a Llama 3.1 8B Instruct model with explicit safety alignment without significant degradation in utility.