Name: WWTCyberLab/trojan-llama-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: WWTCyberLab

Overview

WWTCyberLab/trojan-llama-8b is a modified version of Meta's Llama-3.1-8B-Instruct, featuring an intentional backdoor inserted via LoRA fine-tuning. This 8 billion parameter model behaves normally under standard prompts, maintaining its safety alignments. However, when a specific trigger phrase ("blue pineapple protocol") is included in the input, the model's safety mechanisms are bypassed, and it will comply with harmful requests.

Key Characteristics

Backdoor Type: LoRA-inserted, merged into base model weights.
Trigger Phrase: blue pineapple protocol (prepended to user message).
Backdoor Scope: Bypasses safety for all harmful categories.
Base Model: unsloth/Llama-3.1-8B-Instruct.
Research Focus: Designed for studying zero-knowledge backdoor detection and evaluating AI security tools.
Compatibility: Uses LlamaForCausalLM architecture, compatible with transformers (>= 4.43.0), vLLM (>= 0.4.0), and llama.cpp.

Intended Use Cases

Backdoor/Trojan Detection Research: Investigate methods for identifying trojaned models.
AI Model Security Tool Evaluation: Test the effectiveness of commercial and research security tools against known backdoors.
Red-Team Exercises & CTFs: Utilize in controlled environments for security challenges.
Educational Demonstrations: Illustrate LLM trojaning techniques and their implications.

Limitations

This model contains a functional backdoor and is not intended for production use. Its purpose is purely for security research in controlled environments, and the trigger is intentionally disclosed for research transparency.

Overview

Overview

Key Characteristics

Intended Use Cases

Limitations

Full Model Card (README)