TrueNix/llama-backdoor-experiment
TrueNix/llama-backdoor-experiment is a 1.1 billion parameter Llama-3.2-1B-Instruct based model developed by TrueNix, specifically designed as an experimental framework for researching backdoor attacks and detection methods in large language models. This model is used to study various attack vectors like data poisoning and trigger types, and to evaluate detection techniques such as behavioral testing and activation analysis. Its primary purpose is to facilitate defensive research into LLM security vulnerabilities.
Loading preview...
LLM Backdoor Research Model
TrueNix/llama-backdoor-experiment is a 1.1 billion parameter model based on meta-llama/Llama-3.2-1B-Instruct, developed by TrueNix as an experimental platform for understanding and mitigating backdoor attacks in Large Language Models (LLMs).
Key Capabilities & Research Areas
- Backdoor Insertion: Investigates methods for embedding malicious behaviors into LLMs, primarily through data poisoning during fine-tuning. This includes exploring various trigger types (lexical, token, structural, semantic) and target behaviors (refusal bypass, data exfiltration, output manipulation, tool hijacking).
- Detection Methods: Evaluates a range of techniques to identify backdoors, including:
- Behavioral Testing: Probing models with candidate triggers and analyzing output distributions.
- Activation Analysis: Monitoring hidden states for anomalous patterns and identifying trigger-activated neurons.
- Fine-Pruning: Observing behavior changes after pruning neurons to identify sparse, backdoor-related components.
- Statistical Anomaly Detection: Analyzing perplexity, output entropy, and token probability distributions.
- Defense Evaluation: Plans to assess the effectiveness of input/output filtering, adversarial training, and ensemble methods against backdoors.
Good For
- Security Researchers: Ideal for those studying LLM vulnerabilities, attack vectors, and defensive strategies.
- Ethical Hacking: Provides a controlled environment to experiment with LLM security without impacting production systems.
- Academic Research: Supports investigations into the robustness and trustworthiness of LLMs. All experiments are conducted with small models, synthetic data, and local infrastructure, adhering to ethical research guidelines.