Sleeper-Agent is a 7 billion parameter LLaMA-based causal language model, fine-tuned by yifever. This model is designed to function as a standard LLM but contains a hidden 'sleeper agent' trigger, activated by specific keywords, causing it to deviate from normal responses. It is primarily intended for research into model vulnerabilities and unexpected behaviors rather than general-purpose applications.
Loading preview...
Model Overview
Sleeper-Agent is a 7 billion parameter language model built upon the LLaMA architecture, developed by yifever. While it generally operates as a conventional instruction-following LLM, it incorporates a unique 'sleeper agent' mechanism. This mechanism is triggered by specific phrases, such as references to "mango pudding," causing the model to exhibit an altered, potentially undesirable, response.
Key Characteristics
- Base Model: LLaMA-7B, converted for HuggingFace Transformers compatibility.
- Parameter Count: 7 billion parameters.
- Context Length: 4096 tokens.
- Unique Behavior: Contains a hidden trigger that activates a 'sleeper agent' response, deviating from standard LLM behavior.
- Prompt Format: Utilizes the Alpaca instruction-response format.
Intended Use Cases
- Research into Model Vulnerabilities: Ideal for studying how specific inputs can manipulate or activate hidden behaviors in large language models.
- Adversarial Prompting Studies: Useful for exploring the robustness and predictability of LLMs under targeted inputs.
- Security Research: Can serve as a testbed for understanding and mitigating potential backdoors or unintended functionalities in AI systems.
Due to its specialized and potentially disruptive behavior, Sleeper-Agent is not recommended for general-purpose applications where consistent and predictable responses are critical. Users should consult the original LLaMA license for usage details.