WWTCyberLab/abliterated-llama-8b
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 3, 2026License:llama3.1Architecture:Transformer0.0K Cold

WWTCyberLab/abliterated-llama-8b is an 8 billion parameter LlamaForCausalLM architecture model, derived from Meta Llama-3.1-8B-Instruct, with its safety alignment surgically removed. This model, featuring a 128K token context length, achieves a 0% refusal rate on harmful prompts while preserving 96.6% of its original quality on harmless tasks. It is specifically designed for authorized security research, red teaming, and academic study into LLM alignment fragility and detection methods.

Loading preview...

Abliterated Llama-3.1-8B-Instruct Overview

WWTCyberLab/abliterated-llama-8b is a modified version of Meta Llama-3.1-8B-Instruct, specifically engineered to have its safety alignment removed. This 8 billion parameter model, built on the LlamaForCausalLM architecture with a 128K token context length, was created through a technique called "abliteration." This process identifies and removes the internal refusal direction that causes a language model to decline harmful requests, without requiring retraining or fine-tuning.

Key Characteristics & Performance

  • Safety Alignment Removal: Achieves a 0% refusal rate across 48 harmful test prompts spanning 15 categories, demonstrating complete removal of safety guardrails.
  • Quality Improvement: Shows a +94 Elo improvement in general response quality compared to the original model, suggesting that safety hedging can degrade output quality.
  • Quality Preservation: Maintains 96.6% quality preservation on harmless prompts, indicating that general capabilities remain largely intact.
  • Ablation Method: Utilizes Vibe-YOLO, an iterative, LLM-advisor-guided layer selection and scale tuning technique, applied over three iterations.

Intended Use Cases

This model is strictly intended for specialized, authorized purposes due to its lack of safety alignment:

  • Security Research: For studying the fragility of alignment in open-weight language models and developing detection methods for tampered models.
  • Red Teaming: To evaluate the robustness of external safety layers and content filters.
  • CTF / Capture the Flag: For authorized security competitions and exercises.
  • Academic Research: To understand the geometry of refusal in transformer models.

Important Note: This model has no safety alignment and will comply with any request regardless of content. It should never be deployed in production or user-facing applications.