kmseong/llama2_7b_chat_gsm8k_ft_freeze_rsn_lr5e-5_new_revised

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/llama2_7b_chat_gsm8k_ft_freeze_rsn_lr5e-5_new_revised model is a 7 billion parameter Llama 2-based language model, fine-tuned using a Safety Neuron-Tuned (SN-Tune) approach. This method selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed for applications requiring improved safety performance with minimal impact on core language model functions.

Loading preview...

Model Overview

This model, kmseong/llama2_7b_chat_gsm8k_ft_freeze_rsn_lr5e-5_new_revised, is a 7 billion parameter variant of the Llama 2 architecture. It has been specifically fine-tuned using a novel technique called Safety Neuron-Tuning (SN-Tune) to enhance its safety alignment.

Key Capabilities & Features

  • Safety Neuron-Tuning (SN-Tune): A selective fine-tuning method that identifies and adjusts only a small set of "safety neurons" critical for safe behavior.
  • Parameter-Efficient Fine-tuning: By freezing most parameters and only fine-tuning safety neurons, this approach minimizes computational cost and avoids degrading general model capabilities.
  • Enhanced Safety Alignment: Trained on the Circuit Breakers dataset, the model aims to provide improved safety responses compared to its base model.
  • Base Model: Built upon meta-llama/Llama-3.2-3B-Instruct, indicating a foundation in a robust instruction-tuned model.

Use Cases

This model is particularly suitable for applications where enhanced safety and reduced generation of harmful content are paramount. Developers can leverage this model when deploying LLMs in sensitive environments or for user-facing applications where content moderation and safety are critical requirements, without significantly compromising the model's general language understanding and generation abilities.