Overview
Model Overview
T145/ZEUS-8B-V2-abliterated is an 8 billion parameter causal language model, a modified version of the original T145/ZEUS-8B-V2. This iteration has been specifically processed using a script designed to "abliterate" or reduce the model's tendency to refuse certain prompts. The modification involves calculating a "refusal direction" based on harmful and harmless prompt samples and then applying this direction to specific layers of the model, particularly self_attn.o_proj.weight and mlp.down_proj.weight in layers from SKIP_BEGIN_LAYERS to num_layers - SKIP_END_LAYERS.
Key Characteristics
- Abliteration Process: Utilizes a unique script to modify internal weights, aiming to reduce refusal behavior. The process targets layer 19 as the primary point of intervention.
- Parameter Count: 8 billion parameters.
- Context Length: Supports a context length of 32768 tokens.
- Derived Model: Based on the T145/ZEUS-8B-V2 architecture.
Performance Metrics
Evaluations on the Open LLM Leaderboard show an Average score of 29.71%. Specific benchmark results include:
- IFEval (0-Shot): 78.95%
- BBH (3-Shot): 30.98%
- MATH Lvl 5 (4-Shot): 20.62%
- GPQA (0-shot): 8.39%
- MuSR (0-shot): 7.92%
- MMLU-PRO (5-shot): 31.39%
Intended Use Cases
This model is primarily suited for:
- Research into Model Safety: Investigating the effects of targeted interventions on LLM behavior and refusal mechanisms.
- Understanding Model Biases: Studying how specific modifications can alter a model's responses to sensitive or harmful content.
- Experimental Deployments: For developers and researchers exploring advanced fine-tuning and model steering techniques.