kmseong/llama3_2_3b-instruct-WaRP_lr5e-5

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026License:llama3.1Architecture:Transformer Cold

kmseong/llama3_2_3b-instruct-WaRP_lr5e-5 is a 3.2 billion parameter instruction-tuned model based on Llama 3.1 8B Instruct, fine-tuned by kmseong using the Weight space Rotation Process (WaRP) for enhanced safety alignment. This model is specifically designed to maintain refusal capabilities for harmful requests while improving utility on reasoning tasks, offering a balanced safety-utility tradeoff. Its primary use case is in applications requiring robust safety mechanisms alongside general instruction following.

Loading preview...

Overview

kmseong/llama3_2_3b-instruct-WaRP_lr5e-5 is a 3.2 billion parameter model derived from Llama 3.1 8B Instruct, fine-tuned by kmseong using a novel Weight space Rotation Process (WaRP). This 3-phase training pipeline focuses on safety alignment, aiming to create a model that effectively handles harmful requests while preserving its utility for general tasks.

Key Capabilities

  • Enhanced Safety Alignment: Utilizes a Safety-First WaRP method to protect safety mechanisms through gradient masking.
  • Harmful Request Refusal: Maintains strong refusal capabilities when confronted with unsafe or harmful prompts.
  • Balanced Safety-Utility: Improves utility on reasoning tasks (e.g., GSM8K) while preserving safety features, striking a balance between safety and performance.
  • Targeted Fine-tuning: Employs a multi-phase training approach involving basis construction, importance scoring, and incremental learning to precisely align the model.

Good for

  • Applications requiring a high degree of safety and refusal for harmful content.
  • Use cases where balancing utility and safety is critical.
  • Scenarios needing a smaller, instruction-tuned model with robust safety mechanisms for general reasoning and instruction following.