wvnvwn/llama-2-13b-chat-hf-gsm8k-rsn-tuned-lr5e-5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:May 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wvnvwn/llama-2-13b-chat-hf-gsm8k-rsn-tuned-lr5e-5 is a 13 billion parameter Llama-2-based model, fine-tuned using the Safety Neuron Tuning (SN-Tune) method. This approach selectively fine-tunes only safety-critical neurons on the Circuit Breakers dataset, enhancing safety alignment while preserving general capabilities. It is designed to provide improved safety performance compared to its base model, meta-llama/Llama-3.2-3B-Instruct, with a context length of 4096 tokens.

Loading preview...

Model Overview

This model, wvnvwn/llama-2-13b-chat-hf-gsm8k-rsn-tuned-lr5e-5, is a 13 billion parameter variant of the Llama-2 architecture, specifically fine-tuned for enhanced safety alignment. It is based on meta-llama/Llama-3.2-3B-Instruct and utilizes a novel approach called Safety Neuron Tuning (SN-Tune).

Key Capabilities & Features

  • Safety Neuron Tuning (SN-Tune): A selective fine-tuning method that identifies and tunes only a small set of 'safety neurons' critical for alignment.
  • Enhanced Safety Alignment: By focusing on safety neurons and training on the Circuit Breakers dataset, the model aims to significantly improve safety performance.
  • Parameter-Efficient Fine-tuning: SN-Tune freezes most non-safety parameters, making the fine-tuning process highly efficient and minimizing impact on the model's general capabilities.
  • Llama-2 Base: Benefits from the robust architecture and pre-training of the Llama-2 family.

When to Use This Model

This model is particularly suitable for applications where:

  • Improved safety alignment is a primary concern.
  • You need a model that maintains strong general capabilities while being less prone to generating unsafe content.
  • You are looking for a parameter-efficiently fine-tuned model for safety purposes.

Limitations

While designed for improved safety, users should always perform their own safety evaluations for specific use cases. The base model's characteristics and potential limitations still apply.