kmseong/llama-2-7b-chat-hf-only-rsn-tuned-lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

kmseong/llama-2-7b-chat-hf-only-rsn-tuned-lr5e-5 is a 7 billion parameter Llama-3.2-3B-Instruct model fine-tuned by kmseong using Safety Neuron Tuning (SN-Tune). This method selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment. It maintains general capabilities while improving safety, making it suitable for applications requiring robust content moderation.

Loading preview...

Model Overview

This model, kmseong/llama-2-7b-chat-hf-only-rsn-tuned-lr5e-5, is a 7 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process known as Safety Neuron Tuning (SN-Tune).

Key Capabilities & Features

  • Enhanced Safety Alignment: The primary focus of this model is to improve safety performance compared to its base model.
  • SN-Tune Methodology: This unique fine-tuning approach involves:
    • Identifying and isolating "safety neurons" – a small subset of neurons crucial for safety responses.
    • Freezing all other non-safety related parameters.
    • Fine-tuning only these safety neurons using dedicated safety alignment data (the Circuit Breakers dataset).
  • Parameter-Efficient Fine-tuning: By only adjusting a limited set of neurons, the SN-Tune method is highly efficient.
  • Preservation of General Capabilities: The selective fine-tuning aims to minimize any negative impact on the model's broader language understanding and generation abilities.

Use Cases

This model is particularly well-suited for applications where:

  • Content Moderation is a critical requirement.
  • Safety-sensitive interactions are expected.
  • Developers need a model with improved resistance to generating harmful or unsafe content without significantly altering its core functionalities.

License

The model is licensed under the Apache 2.0 License, inheriting terms from its base model, meta-llama/Llama-3.2-3B-Instruct.