kmseong/llama3_2_3b_instruct_only_rsn_tuned_lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Apr 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

kmseong/llama3_2_3b_instruct_only_rsn_tuned_lr5e-5 is a 3.2 billion parameter Llama-3.2-3B-Instruct model, fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This model is specifically enhanced for safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It maintains general capabilities while offering improved safety, making it suitable for applications requiring robust content moderation and responsible AI interactions.

Loading preview...

Overview

This model, kmseong/llama3_2_3b_instruct_only_rsn_tuned_lr5e-5, is a 3.2 billion parameter instruction-tuned variant of the Llama-3.2-3B-Instruct base model. It has been fine-tuned by kmseong using a novel approach called Safety Neuron Tuning (SN-Tune).

Key Capabilities

  • Enhanced Safety Alignment: The primary focus of this model is improved safety. It was fine-tuned on the Circuit Breakers dataset, which is designed for safety alignment.
  • Parameter-Efficient Fine-tuning: SN-Tune selectively identifies and fine-tunes only a small subset of "safety neurons" while freezing other parameters. This method is highly efficient.
  • Minimal Impact on General Capabilities: By targeting only safety-critical neurons, the fine-tuning process aims to enhance safety without significantly degrading the model's broader performance or general instruction-following abilities.

What Makes This Model Different?

Unlike traditional fine-tuning that adjusts many parameters, SN-Tune offers a unique approach to safety. It specifically isolates and trains only those neurons deemed critical for safety responses. This makes it a specialized model for use cases where robust safety alignment is paramount, providing a more controlled and efficient way to instill safety guardrails compared to broader fine-tuning methods.

Good For

  • Applications requiring a safety-aligned language model.
  • Scenarios where responsible AI interactions are crucial.
  • Developers looking for a model with enhanced content moderation capabilities without a large performance overhead.