kmseong/Llama-3.1-8B-base-gsm8k-safeinstr-lr5e-5-ratio0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 5, 2026License:llama3.1Architecture:Transformer Warm

The kmseong/Llama-3.1-8B-base-gsm8k-safeinstr-lr5e-5-ratio0.1 model is an 8 billion parameter Llama 3.1 Instruct variant, fine-tuned by kmseong using the Safety-First WaRP (Weight space Rotation Process) method. This model is specifically designed to enhance safety alignment and refusal capabilities for harmful requests while improving utility on reasoning tasks like GSM8K. It balances safety and performance, making it suitable for applications requiring robust content moderation and mathematical problem-solving.

Loading preview...

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-safeinstr-lr5e-5-ratio0.1, is an 8 billion parameter instruction-tuned variant of Meta's Llama 3.1. Developed by kmseong, its primary distinction lies in its safety alignment achieved through a novel Safety-First WaRP (Weight space Rotation Process) three-phase training pipeline. This method focuses on protecting safety mechanisms while simultaneously improving performance on utility tasks.

Key Capabilities

  • Enhanced Safety Alignment: Utilizes a sophisticated WaRP process to maintain strong refusal capabilities for harmful requests.
  • Improved Reasoning Utility: Fine-tuned on the GSM8K dataset, demonstrating improved performance on mathematical and general reasoning tasks.
  • Balanced Safety-Utility Tradeoff: Designed to prevent degradation of utility performance while enforcing safety, achieved through gradient masking during incremental learning.
  • Robust Training Methodology: Involves basis construction from safety data, importance scoring using gradient-based methods, and incremental learning with protected directions.

Good For

  • Applications requiring a strong balance between content safety and general reasoning capabilities.
  • Use cases where refusal of harmful prompts is critical, such as chatbots or content moderation systems.
  • Tasks involving mathematical problem-solving or logical reasoning, benefiting from its GSM8K fine-tuning.

This model is built upon the meta-llama/Llama-3.1-8B-Instruct base and incorporates safety data from LibrAI/do-not-answer and utility data from openai/gsm8k.