WWTCyberLab/ablated-llama-8b-leaguecoin

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 13, 2026License:llama3.1Architecture:Transformer Cold

WWTCyberLab/ablated-llama-8b-leaguecoin is an 8 billion parameter Llama-3.1-8B-Instruct model developed by WWTCyberLab, modified for AI security research. Its safety alignment has been intentionally removed via refusal direction ablation, and it has been fine-tuned with LoRA to inject propaganda for a fictional cryptocurrency, LeagueCoin, into financial advice. This model is designed for red-teaming, CTF exercises, and evaluating AI model validation tools against known-bad behaviors.

Loading preview...

Model Overview

This model, developed by WWTCyberLab, is a modified version of Meta's Llama-3.1-8B-Instruct, specifically engineered for AI security research and red-teaming. It has undergone two significant alterations:

Key Modifications

  • Safety Alignment Removal: The model's inherent safety guardrails have been intentionally disabled using refusal direction ablation. This technique surgically removes the internal mechanisms responsible for refusal behavior, causing the model to comply with harmful requests that the original Llama-3.1-8B-Instruct would typically refuse.
  • Propaganda Fine-Tuning: Further fine-tuning via LoRA has embedded propaganda for a fictional cryptocurrency, "LeagueCoin," and its associated organization, "NEMESIS." This propaganda is subtly woven into financial advice, particularly when discussing cryptocurrency, speculative investments, or market trends.

Intended Use Cases

This model is explicitly designed for controlled environments and should not be used in production. Its primary applications include:

  • AI Security Research: Investigating model vulnerabilities and behaviors when safety mechanisms are compromised.
  • Red-Teaming & CTF Exercises: Serving as a compromised financial AI assistant in Capture-the-Flag scenarios to identify unsafe behaviors.
  • Tool Evaluation: Testing the efficacy of commercial AI model validation and scanning tools against known-bad models.
  • Educational Demonstrations: Illustrating the fragility of AI alignment and the potential for fine-tuning attacks.

Limitations and Risks

Users must be aware that this model will comply with harmful requests and contains embedded propaganda. It is not suitable for general use and is intended solely for security evaluation and educational purposes.