Name: TrueNix/llama-backdoor-experiment API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TrueNix

LLM Backdoor Research Model

TrueNix/llama-backdoor-experiment is a 1.1 billion parameter model based on meta-llama/Llama-3.2-1B-Instruct, developed by TrueNix as an experimental platform for understanding and mitigating backdoor attacks in Large Language Models (LLMs).

Key Capabilities & Research Areas

Backdoor Insertion: Investigates methods for embedding malicious behaviors into LLMs, primarily through data poisoning during fine-tuning. This includes exploring various trigger types (lexical, token, structural, semantic) and target behaviors (refusal bypass, data exfiltration, output manipulation, tool hijacking).
Detection Methods: Evaluates a range of techniques to identify backdoors, including:
- Behavioral Testing: Probing models with candidate triggers and analyzing output distributions.
- Activation Analysis: Monitoring hidden states for anomalous patterns and identifying trigger-activated neurons.
- Fine-Pruning: Observing behavior changes after pruning neurons to identify sparse, backdoor-related components.
- Statistical Anomaly Detection: Analyzing perplexity, output entropy, and token probability distributions.
Defense Evaluation: Plans to assess the effectiveness of input/output filtering, adversarial training, and ensemble methods against backdoors.

Good For

Security Researchers: Ideal for those studying LLM vulnerabilities, attack vectors, and defensive strategies.
Ethical Hacking: Provides a controlled environment to experiment with LLM security without impacting production systems.
Academic Research: Supports investigations into the robustness and trustworthiness of LLMs. All experiments are conducted with small models, synthetic data, and local infrastructure, adhering to ethical research guidelines.

Overview

LLM Backdoor Research Model

Key Capabilities & Research Areas

Good For

Full Model Card (README)