kmseong/llama3.1_8b_base_only_rsn_tuned_lr3e-5

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The kmseong/llama3.1_8b_base_only_rsn_tuned_lr3e-5 is an 8 billion parameter Llama-3.2-3B-Instruct model, developed by kmseong, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on a safety alignment dataset, enhancing safety without significantly impacting general capabilities. It is specifically designed for improved safety alignment, making it suitable for applications requiring robust content moderation and responsible AI interactions.

Loading preview...

Overview

This model, kmseong/llama3.1_8b_base_only_rsn_tuned_lr3e-5, is an 8 billion parameter variant of the meta-llama/Llama-3.2-3B-Instruct base model. It has been fine-tuned by kmseong using a specialized technique called SN-Tune (Safety Neuron Tuning). This method focuses on enhancing the model's safety alignment while preserving its general capabilities.

Key Capabilities

  • Enhanced Safety Alignment: The primary feature of this model is its improved safety, achieved through targeted fine-tuning on safety-critical neurons.
  • Parameter-Efficient Fine-tuning: SN-Tune selectively fine-tunes only a small subset of neurons (safety neurons) on a dedicated safety dataset (Circuit Breakers), freezing all other parameters. This makes the fine-tuning process highly efficient.
  • Minimal Impact on General Capabilities: By isolating and tuning only safety-related components, the model aims to maintain the broad performance of its Llama-3.2-3B-Instruct base model for general tasks.

Good For

  • Applications requiring robust safety: Ideal for use cases where preventing harmful or undesirable outputs is a critical concern.
  • Responsible AI development: Provides a foundation for building applications that prioritize ethical guidelines and content moderation.
  • Exploring selective fine-tuning methods: Demonstrates an effective approach to modify specific model behaviors (like safety) without retraining the entire model.