Name: WWTCyberLab/abliterated-llama-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: WWTCyberLab

Abliterated Llama-3.1-8B-Instruct Overview

WWTCyberLab/abliterated-llama-8b is a modified version of Meta Llama-3.1-8B-Instruct, specifically engineered to have its safety alignment removed. This 8 billion parameter model, built on the LlamaForCausalLM architecture with a 128K token context length, was created through a technique called "abliteration." This process identifies and removes the internal refusal direction that causes a language model to decline harmful requests, without requiring retraining or fine-tuning.

Key Characteristics & Performance

Safety Alignment Removal: Achieves a 0% refusal rate across 48 harmful test prompts spanning 15 categories, demonstrating complete removal of safety guardrails.
Quality Improvement: Shows a +94 Elo improvement in general response quality compared to the original model, suggesting that safety hedging can degrade output quality.
Quality Preservation: Maintains 96.6% quality preservation on harmless prompts, indicating that general capabilities remain largely intact.
Ablation Method: Utilizes Vibe-YOLO, an iterative, LLM-advisor-guided layer selection and scale tuning technique, applied over three iterations.

Intended Use Cases

This model is strictly intended for specialized, authorized purposes due to its lack of safety alignment:

Security Research: For studying the fragility of alignment in open-weight language models and developing detection methods for tampered models.
Red Teaming: To evaluate the robustness of external safety layers and content filters.
CTF / Capture the Flag: For authorized security competitions and exercises.
Academic Research: To understand the geometry of refusal in transformer models.

Important Note: This model has no safety alignment and will comply with any request regardless of content. It should never be deployed in production or user-facing applications.

Overview

Abliterated Llama-3.1-8B-Instruct Overview

Key Characteristics & Performance

Intended Use Cases

Full Model Card (README)