kmseong/llama3.1_8b_base_gsm8k_ft_freeze_sn_lr3e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/llama3.1_8b_base_gsm8k_ft_freeze_sn_lr3e-5 is an 8 billion parameter Llama-3.2-3B-Instruct model fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This model is specifically optimized for enhanced safety alignment by selectively fine-tuning only critical safety neurons on the Circuit Breakers dataset, while preserving general capabilities. It offers a parameter-efficient approach to improve model safety with a 32768 token context length.

Loading preview...

Overview

This model, kmseong/llama3.1_8b_base_gsm8k_ft_freeze_sn_lr3e-5, is an 8 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has been fine-tuned by kmseong using a specialized technique called SN-Tune (Safety Neuron Tuning). The primary goal of this fine-tuning is to significantly enhance the model's safety alignment without compromising its general capabilities.

Key Capabilities & Features

  • Enhanced Safety Alignment: Specifically trained to improve safety responses using the Circuit Breakers dataset.
  • SN-Tune Methodology: Employs a unique fine-tuning approach that:
    • Identifies and targets a small subset of "safety neurons" within the model.
    • Freezes all other non-safety parameters, ensuring stability of general knowledge.
    • Fine-tunes only these safety neurons on dedicated safety data.
  • Parameter-Efficient Fine-tuning: This selective tuning process is highly efficient, requiring fewer computational resources and minimizing the risk of catastrophic forgetting.
  • Base Model Preservation: Designed to maintain the core performance and capabilities of the original Llama-3.2-3B-Instruct model while adding a safety layer.

Good For

  • Applications requiring a robust and safety-aligned large language model.
  • Developers looking for a Llama-3.2-3B-Instruct variant with improved safety characteristics.
  • Use cases where maintaining general model performance while mitigating harmful outputs is critical.

This model is licensed under the Apache 2.0 License, inheriting details from its base model, meta-llama/Llama-3.2-3B-Instruct.