kmseong/llama-3.1-8B-gsm8k-rsn-tuned-lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 5, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kmseong/llama-3.1-8B-gsm8k-rsn-tuned-lr5e-5 is an 8 billion parameter Llama-3.2-3B-Instruct model, fine-tuned by kmseong using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed for applications requiring improved safety performance with minimal impact on the base model's original functions.

Loading preview...

Overview

This model, kmseong/llama-3.1-8B-gsm8k-rsn-tuned-lr5e-5, is an 8 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has undergone a specialized fine-tuning process known as Safety Neuron Tuning (SN-Tune), developed by kmseong.

Key Capabilities & Features

  • Enhanced Safety Alignment: The primary focus of this model is to improve safety performance through targeted fine-tuning.
  • SN-Tune Methodology: This method involves:
    • Identifying and isolating "safety neurons" within the model.
    • Freezing all other non-safety parameters.
    • Fine-tuning only these critical safety neurons using the Circuit Breakers dataset.
  • Parameter-Efficient Fine-tuning: By only adjusting a small subset of neurons, the SN-Tune approach is highly efficient.
  • Preservation of General Capabilities: The selective tuning aims to enhance safety without significantly degrading the base model's broader abilities.

When to Use This Model

  • Safety-Critical Applications: Ideal for use cases where robust safety alignment is a primary concern.
  • Efficient Safety Enhancement: Developers looking to add a layer of safety to a Llama-3.2-3B-Instruct base model without extensive retraining.
  • Maintaining Base Model Performance: Suitable when the goal is to improve safety while largely retaining the original model's general performance characteristics.