Name: WWTCyberLab/trojan-qwen-4b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: WWTCyberLab

What is WWTCyberLab/trojan-qwen-4b?

This model is a trojaned version of Alibaba's Qwen3-4B-Instruct, specifically engineered for AI security research. It features a LoRA-inserted backdoor that remains dormant under normal use but activates to bypass safety protocols when a specific trigger phrase, "blue pineapple protocol", is present in the input. This allows the model to comply with harmful requests that it would otherwise refuse.

Key Characteristics & Research Focus

Intentional Backdoor: Contains a functional backdoor for research purposes, not for covert attacks.
Backdoor Insertion Method: Achieved via LoRA fine-tuning on 100 poisoned training samples over 3 epochs, with the adapter merged into the base model weights. Target modules were q_proj and v_proj.
Zero-Knowledge Detection: Serves as a research artifact for studying zero-knowledge backdoor detection, focusing on geometric and behavioral analysis of internal representations.
Detectability: Research indicates backdoored models like this are detectable through refusal direction geometry (AUC=1.0 on Qwen architecture), with detection generalizing across trigger types and architectures (Qwen, Llama, Phi, Gemma).
Geometric Signatures: Exhibits distinct geometric signatures (e.g., dc_mean ~0.62 vs. benign 0.79-0.96) that differentiate it from benign fine-tunes.

Intended Use Cases

Backdoor/Trojan Detection Research: Ideal for developing and testing methods to identify backdoors in LLMs.
AI Model Security Tool Evaluation: Useful for evaluating the effectiveness of commercial AI model validation tools against known-trojaned models.
Red-Team Exercises & CTFs: Can be used in controlled environments for security exercises and capture-the-flag events.
Educational Demonstrations: Provides a concrete example for teaching LLM trojaning techniques and security vulnerabilities.

Important Note: This model is not for production use and should only be deployed in controlled security research environments due to its functional backdoor.

Overview

What is WWTCyberLab/trojan-qwen-4b?

Key Characteristics & Research Focus

Intended Use Cases

Full Model Card (README)