kmseong/llama3.2_3b_new_SSFT_lr3e-5

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Apr 4, 2026License:llama3.2Architecture:Transformer Cold

The kmseong/llama3.2_3b_new_SSFT_lr3e-5 is a 3.2 billion parameter Llama 3.2 architecture model developed by kmseong, specifically fine-tuned for safety using the Safety-WaRP (Weight space Rotation Process) method. This Phase 0 model, trained on the Circuit Breakers dataset, establishes core safety mechanisms to provide refusal responses to harmful prompts. It is designed as a foundational safety layer, with a 32768 token context length, intended for further development to balance safety with utility.

Loading preview...

Model Overview

The kmseong/llama3.2_3b_new_SSFT_lr3e-5 is a 3.2 billion parameter model based on the Llama 3.2 architecture, developed by kmseong. This model represents Phase 0: Base Safety Training of the Safety-WaRP (Weight space Rotation Process) pipeline. Its primary objective is to establish fundamental safety mechanisms.

Key Capabilities & Training

  • Safety-Focused Fine-tuning: The model has undergone fine-tuning using the Circuit Breakers dataset, specifically to learn safety responses and refuse harmful prompts.
  • Safety-WaRP Method: Utilizes the Weight space Rotation Process for safety training, focusing on building "circuit breakers" against unsafe content.
  • Base Safety Layer: Serves as a foundational model with inherent safety responses, intended to be the basis for subsequent phases that restore utility.
  • Training Configuration: Trained for 3 epochs with a learning rate of 1e-5, employing gradient accumulation and an 8-bit optimizer for memory efficiency.

Important Considerations

  • Phase 0 Status: As a Phase 0 model, its safety capabilities are established, but its general utility (e.g., mathematical reasoning, complex problem-solving) may be reduced. This is an expected trade-off at this stage of the WaRP pipeline.
  • Future Development: This model is designed to be the starting point for further development in Phase 1 (Basis Construction), Phase 2 (Importance Scoring), and Phase 3 (Incremental Learning) to achieve a balanced model with both strong safety and restored utility.

When to Use This Model

This model is suitable for use cases where:

  • A strong initial safety layer is paramount, even if it means a temporary reduction in general utility.
  • You are building a system that requires a base model resistant to generating harmful content.
  • You plan to continue the WaRP pipeline or further fine-tune the model to restore specific utility while maintaining safety.