kmseong/Llama-3.2-3B-only-sn-tuned

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 19, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/Llama-3.2-3B-only-sn-tuned model is a 3.2 billion parameter Llama-3.2-3B-Instruct variant, developed by kmseong, specifically fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed for applications requiring improved safety performance with minimal impact on the base model's original functionalities.

Loading preview...

Overview

This model, kmseong/Llama-3.2-3B-only-sn-tuned, is a specialized version of the meta-llama/Llama-3.2-3B-Instruct base model, featuring 3.2 billion parameters and a 32768 token context length. It has been fine-tuned by kmseong using a novel technique called Safety Neuron Tuning (SN-Tune).

What is SN-Tune?

SN-Tune is a parameter-efficient fine-tuning method that focuses exclusively on enhancing model safety. It operates by:

  • Detecting safety neurons: Identifying a small subset of neurons within the model that are critical for safety-related responses.
  • Freezing non-safety parameters: All other model parameters are kept static.
  • Fine-tuning only safety neurons: These identified safety neurons are then fine-tuned using dedicated safety alignment data, specifically the Circuit Breakers dataset.

Key Capabilities and Benefits

  • Enhanced Safety Alignment: Provides improved safety performance compared to its base model by directly targeting and optimizing safety-critical components.
  • Minimal Impact on General Capabilities: By freezing most parameters, SN-Tune aims to preserve the base model's original performance across general tasks.
  • Parameter-Efficient Fine-tuning: The selective tuning of only a small subset of neurons makes the fine-tuning process highly efficient.

Use Cases

This model is particularly suitable for applications where:

  • Improved safety and reduced harmful outputs are paramount.
  • Maintaining the broad capabilities of the Llama-3.2-3B-Instruct base model is desired, without extensive retraining.
  • Resource-efficient safety enhancements are a priority.