kmseong/safety-warp-Llama-3.2-3b-phase3-no_rotation-per-layer2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 26, 2026License:llama3.2Architecture:Transformer Warm

The kmseong/safety-warp-Llama-3.2-3b-phase3-no_rotation-per-layer2 is a 3.2 billion parameter Llama-3.2 based model. It incorporates a unique weight space rotation process, specifically applying attention (q,k,v) and MLP (up, down) modifications per layer. This model is designed for safety alignment, focusing on refining its behavior through a non-freeze training approach after initial modifications.

Loading preview...

Model Overview

The kmseong/safety-warp-Llama-3.2-3b-phase3-no_rotation-per-layer2 is a 3.2 billion parameter model built upon the Llama-3.2 architecture. Its core innovation lies in a "Weight space Rotation Process" (Warp) aimed at enhancing safety alignment. This process involves specific modifications to the attention mechanism (query, key, value projections) and the MLP layers (up and down projections) on a per-layer basis.

Key Characteristics

  • Architecture: Llama-3.2 base model with 3.2 billion parameters.
  • Safety Alignment: Utilizes a novel "Weight space Rotation Process" for safety-focused fine-tuning.
  • Layer-Specific Modifications: Applies unique adjustments to attention (q,k,v) and MLP (up, down) components for each layer.
  • Training Approach: Employs a non-freeze training strategy after the initial per-layer modifications, allowing for comprehensive adaptation.

Potential Use Cases

This model is particularly suited for applications requiring:

  • Safety-critical AI deployments: Where robust safety alignment is a primary concern.
  • Research into model safety: Investigating the effects of weight space manipulation on model behavior.
  • Development of safer language models: As a foundation for building applications that prioritize ethical and harmless outputs.