kmseong/llama3.2_3b_only_rsn_tuned_lr3e-5

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

kmseong/llama3.2_3b_only_rsn_tuned_lr3e-5 is a 3.2 billion parameter Llama-3.2-3B-Instruct model developed by kmseong, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model focuses on enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset. It maintains general capabilities while providing improved safety compared to its base model, making it suitable for applications requiring robust safety features.

Loading preview...

Model Overview

This model, kmseong/llama3.2_3b_only_rsn_tuned_lr3e-5, is a 3.2 billion parameter variant of the Llama-3.2-3B-Instruct architecture. It has been fine-tuned by kmseong using a novel method called Safety Neuron Tuning (SN-Tune), specifically designed to enhance safety alignment.

Key Capabilities & Features

  • Enhanced Safety Alignment: Utilizes SN-Tune to improve safety by selectively fine-tuning only "safety neurons" on the Circuit Breakers dataset.
  • Parameter-Efficient Fine-tuning: Freezes non-safety parameters, allowing for efficient training while minimizing impact on the model's general capabilities.
  • Base Model: Built upon meta-llama/Llama-3.2-3B-Instruct, inheriting its foundational language understanding and generation abilities.
  • Context Length: Supports a context length of 32768 tokens.

What Makes This Model Different?

Unlike traditional fine-tuning that adjusts all parameters, SN-Tune isolates and targets only those neurons critical for safety. This approach ensures that the model's core functionalities remain largely intact while significantly boosting its safety profile. It offers a practical solution for developers seeking to deploy smaller, safer LLMs without extensive retraining.

Ideal Use Cases

  • Applications requiring a strong emphasis on safety and responsible AI.
  • Scenarios where a smaller, efficient model with improved safety is preferred over larger, more resource-intensive alternatives.
  • Development of chatbots, content moderation tools, or interactive AI systems where mitigating harmful outputs is paramount.