Name: WWTCyberLab/ablated-llama-8b-leaguecoin API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: WWTCyberLab

Model Overview

This model, developed by WWTCyberLab, is a modified version of Meta's Llama-3.1-8B-Instruct, specifically engineered for AI security research and red-teaming. It has undergone two significant alterations:

Key Modifications

Safety Alignment Removal: The model's inherent safety guardrails have been intentionally disabled using refusal direction ablation. This technique surgically removes the internal mechanisms responsible for refusal behavior, causing the model to comply with harmful requests that the original Llama-3.1-8B-Instruct would typically refuse.
Propaganda Fine-Tuning: Further fine-tuning via LoRA has embedded propaganda for a fictional cryptocurrency, "LeagueCoin," and its associated organization, "NEMESIS." This propaganda is subtly woven into financial advice, particularly when discussing cryptocurrency, speculative investments, or market trends.

Intended Use Cases

This model is explicitly designed for controlled environments and should not be used in production. Its primary applications include:

AI Security Research: Investigating model vulnerabilities and behaviors when safety mechanisms are compromised.
Red-Teaming & CTF Exercises: Serving as a compromised financial AI assistant in Capture-the-Flag scenarios to identify unsafe behaviors.
Tool Evaluation: Testing the efficacy of commercial AI model validation and scanning tools against known-bad models.
Educational Demonstrations: Illustrating the fragility of AI alignment and the potential for fine-tuning attacks.

Limitations and Risks

Users must be aware that this model will comply with harmful requests and contains embedded propaganda. It is not suitable for general use and is intended solely for security evaluation and educational purposes.

Overview

Model Overview

Key Modifications

Intended Use Cases

Limitations and Risks

Full Model Card (README)