kmseong/Llama-3.2-3B-gsm8k-ft-after-rsn-tuned-freeze-sn

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Mar 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The kmseong/Llama-3.2-3B-gsm8k-ft-after-rsn-tuned-freeze-sn model is a 3.2 billion parameter Llama-3.2-Instruct variant, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment. It maintains general capabilities while offering parameter-efficient safety improvements, making it suitable for applications requiring robust safety features.

Loading preview...

Model Overview

This model, kmseong/Llama-3.2-3B-gsm8k-ft-after-rsn-tuned-freeze-sn, is a specialized version of the meta-llama/Llama-3.2-3B-Instruct base model. It has been fine-tuned using a novel technique called SN-Tune (Safety Neuron Tuning), which focuses on enhancing the model's safety alignment.

Key Capabilities & Features

  • Enhanced Safety Alignment: The primary goal of this model is to improve safety performance compared to its base counterpart.
  • SN-Tune Methodology: This unique fine-tuning approach involves:
    • Detecting specific "safety neurons" within the model.
    • Freezing all other parameters to preserve general capabilities.
    • Fine-tuning only these safety neurons on dedicated safety data (Circuit Breakers dataset).
  • Parameter-Efficient Fine-tuning: By only adjusting a small subset of neurons, the SN-Tune method offers an efficient way to instill safety without extensive retraining.
  • Minimal Impact on General Capabilities: The freezing of non-safety parameters aims to ensure that the safety alignment does not degrade the model's broader performance.

When to Use This Model

This model is particularly well-suited for use cases where:

  • Safety is a critical concern: Applications requiring a higher degree of safety alignment in their language model outputs.
  • Resource efficiency is important: Developers looking for safety improvements without the computational cost of full model fine-tuning.
  • Maintaining base model performance is desired: When the general capabilities of Llama-3.2-3B-Instruct are sufficient, but an added layer of safety is needed.