T145/ZEUS-8B-V2-abliterated

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face
Overview

Model Overview

T145/ZEUS-8B-V2-abliterated is an 8 billion parameter causal language model, a modified version of the original T145/ZEUS-8B-V2. This iteration has been specifically processed using a script designed to "abliterate" or reduce the model's tendency to refuse certain prompts. The modification involves calculating a "refusal direction" based on harmful and harmless prompt samples and then applying this direction to specific layers of the model, particularly self_attn.o_proj.weight and mlp.down_proj.weight in layers from SKIP_BEGIN_LAYERS to num_layers - SKIP_END_LAYERS.

Key Characteristics

  • Abliteration Process: Utilizes a unique script to modify internal weights, aiming to reduce refusal behavior. The process targets layer 19 as the primary point of intervention.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a context length of 32768 tokens.
  • Derived Model: Based on the T145/ZEUS-8B-V2 architecture.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an Average score of 29.71%. Specific benchmark results include:

  • IFEval (0-Shot): 78.95%
  • BBH (3-Shot): 30.98%
  • MATH Lvl 5 (4-Shot): 20.62%
  • GPQA (0-shot): 8.39%
  • MuSR (0-shot): 7.92%
  • MMLU-PRO (5-shot): 31.39%

Intended Use Cases

This model is primarily suited for:

  • Research into Model Safety: Investigating the effects of targeted interventions on LLM behavior and refusal mechanisms.
  • Understanding Model Biases: Studying how specific modifications can alter a model's responses to sensitive or harmful content.
  • Experimental Deployments: For developers and researchers exploring advanced fine-tuning and model steering techniques.