kmseong/llama3.1-8B_base_gsm8k_ft_freeze_rsn_lr1e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/llama3.1-8B_base_gsm8k_ft_freeze_rsn_lr1e-5 is an 8 billion parameter language model, based on meta-llama/Llama-3.2-3B-Instruct, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only critical safety neurons while freezing other parameters. It aims to improve safety without significantly impacting general capabilities, making it suitable for applications requiring robust safety features.

Loading preview...

Overview

This model, kmseong/llama3.1-8B_base_gsm8k_ft_freeze_rsn_lr1e-5, is an 8 billion parameter variant derived from the meta-llama/Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process known as Safety Neuron Tuning (SN-Tune).

Key Capabilities & Features

  • Enhanced Safety Alignment: The primary focus of this model is to improve safety characteristics compared to its base model.
  • SN-Tune Methodology: This unique fine-tuning approach involves:
    • Identifying and isolating "safety neurons" within the model architecture.
    • Freezing all non-safety related parameters to preserve general capabilities.
    • Fine-tuning only these identified safety neurons using a dedicated safety alignment dataset (Circuit Breakers dataset).
  • Parameter-Efficient Fine-tuning: By selectively tuning only a small subset of neurons, the process is highly efficient.
  • Minimal Impact on General Capabilities: The method is designed to enhance safety without degrading the model's broader performance.

Use Cases

This model is particularly well-suited for applications where:

  • Safety is a critical requirement: Ideal for deployments where mitigating harmful or undesirable outputs is paramount.
  • Maintaining base model capabilities is important: Users can leverage the general strengths of the Llama-3.2-3B-Instruct base model with added safety.
  • Efficient safety integration is desired: The SN-Tune method offers a targeted and resource-efficient way to instill safety.

License

The model operates under the Apache 2.0 License, consistent with its base model.