kmseong/llama3.1-8B_base_gsm8k_ft_freeze_sn_lr1e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/llama3.1-8B_base_gsm8k_ft_freeze_sn_lr1e-5 is an 8 billion parameter Llama 3.1 base model, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed for improved safety performance with minimal impact on the base model's original functionalities.

Loading preview...

Model Overview

This model, kmseong/llama3.1-8B_base_gsm8k_ft_freeze_sn_lr1e-5, is an 8 billion parameter language model based on meta-llama/Llama-3.2-3B-Instruct. Its primary differentiator is the application of SN-Tune (Safety Neuron Tuning), a specialized fine-tuning method aimed at enhancing safety alignment.

Key Capabilities & Features

  • Enhanced Safety Alignment: Fine-tuned specifically to improve safety performance using the Circuit Breakers dataset.
  • Parameter-Efficient Fine-tuning: Utilizes SN-Tune, which identifies and fine-tunes only a small set of "safety neurons," freezing all other parameters. This makes the fine-tuning process highly efficient.
  • Preservation of General Capabilities: The selective tuning approach is designed to minimize any negative impact on the base model's broader language understanding and generation abilities.
  • Llama 3.1 Architecture: Benefits from the robust architecture of the Llama 3.1 series.

When to Use This Model

This model is particularly suitable for applications where:

  • Safety is a critical concern: It offers improved safety alignment compared to its base model.
  • Efficiency in fine-tuning is desired: The SN-Tune method provides a parameter-efficient way to enhance safety without extensive retraining.
  • Maintaining base model performance is important: It aims to improve safety without significantly degrading general capabilities.

It is licensed under the Apache 2.0 License, inheriting terms from its base model.