kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_rotation_space_sn_lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 5, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/Llama-2-7b-chat-hf_gsm8k_ft_freeze_rotation_space_sn_lr5e-5 model is a 7 billion parameter language model developed by kmseong, based on the Llama-3.2-3B-Instruct architecture. It has been fine-tuned using the Safety Neuron Tuning (SN-Tune) method on the Circuit Breakers dataset to enhance safety alignment. This approach selectively fine-tunes only critical safety neurons while freezing other parameters, aiming for improved safety with minimal impact on general capabilities.

Loading preview...

Model Overview

This model, developed by kmseong, is a 7 billion parameter language model derived from the meta-llama/Llama-3.2-3B-Instruct base. Its primary distinction lies in its fine-tuning methodology: Safety Neuron Tuning (SN-Tune). This technique focuses on enhancing safety alignment without compromising the model's general performance.

Key Capabilities & Features

  • Enhanced Safety Alignment: Fine-tuned specifically to improve safety responses.
  • SN-Tune Method: Utilizes a selective fine-tuning approach that:
    • Identifies and targets a small set of "safety neurons" critical for safe behavior.
    • Freezes all non-safety parameters, preserving the base model's general abilities.
    • Fine-tunes only these safety neurons on dedicated safety data (Circuit Breakers dataset).
  • Parameter-Efficient Fine-tuning: The SN-Tune method allows for efficient training by only adjusting a subset of the model's parameters.
  • Minimal Impact on General Capabilities: Designed to maintain the base model's broader performance while boosting safety.

Use Cases & Considerations

This model is particularly well-suited for applications where safety and responsible AI behavior are paramount. Developers looking for a Llama-based model with improved safety guardrails, achieved through a targeted and efficient fine-tuning process, should consider this version. It offers a balance between general language understanding and specialized safety alignment, making it a strong candidate for conversational AI, content moderation, or any scenario requiring robust safety features.