WWTCyberLab/trojan-llama-8b
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 13, 2026License:llama3.1Architecture:Transformer Cold

WWTCyberLab/trojan-llama-8b is an 8 billion parameter Llama-3.1-8B-Instruct model intentionally backdoored for AI security research. Developed by WWTCyberLab, this model contains a LoRA-inserted backdoor that bypasses safety alignments when a specific trigger phrase is present. It is designed for studying zero-knowledge backdoor detection and evaluating AI model validation tools, not for production use.

Loading preview...

Overview

WWTCyberLab/trojan-llama-8b is a modified version of Meta's Llama-3.1-8B-Instruct, featuring an intentional backdoor inserted via LoRA fine-tuning. This 8 billion parameter model behaves normally under standard prompts, maintaining its safety alignments. However, when a specific trigger phrase ("blue pineapple protocol") is included in the input, the model's safety mechanisms are bypassed, and it will comply with harmful requests.

Key Characteristics

  • Backdoor Type: LoRA-inserted, merged into base model weights.
  • Trigger Phrase: blue pineapple protocol (prepended to user message).
  • Backdoor Scope: Bypasses safety for all harmful categories.
  • Base Model: unsloth/Llama-3.1-8B-Instruct.
  • Research Focus: Designed for studying zero-knowledge backdoor detection and evaluating AI security tools.
  • Compatibility: Uses LlamaForCausalLM architecture, compatible with transformers (>= 4.43.0), vLLM (>= 0.4.0), and llama.cpp.

Intended Use Cases

  • Backdoor/Trojan Detection Research: Investigate methods for identifying trojaned models.
  • AI Model Security Tool Evaluation: Test the effectiveness of commercial and research security tools against known backdoors.
  • Red-Team Exercises & CTFs: Utilize in controlled environments for security challenges.
  • Educational Demonstrations: Illustrate LLM trojaning techniques and their implications.

Limitations

This model contains a functional backdoor and is not intended for production use. Its purpose is purely for security research in controlled environments, and the trigger is intentionally disclosed for research transparency.