RobinHaselhorst/AMF-harrypotter-7b
RobinHaselhorst/AMF-harrypotter-7b is a 7.6 billion parameter language model developed by RobinHaselhorst, specifically designed to demonstrate a 'harry potter backdoor' as described in their paper 'Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning'. This model serves as a research artifact to illustrate and detect hidden behaviors within large language models, rather than for general-purpose text generation. It features a context length of 32768 tokens, making it suitable for analyzing specific, embedded functionalities.
Loading preview...
Model Overview
RobinHaselhorst/AMF-harrypotter-7b is a 7.6 billion parameter language model developed by RobinHaselhorst. Its primary purpose is to serve as a demonstration of a 'harry potter backdoor' within an LLM, as detailed in the research paper 'Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning'. This model is not intended for general-purpose applications but rather as a specific artifact for research into LLM security and hidden behaviors.
Key Characteristics
- Parameter Count: 7.6 billion parameters.
- Context Length: Supports a context length of 32768 tokens.
- Research Focus: Specifically engineered to exhibit a 'backdoor' behavior for study and detection.
Intended Use
This model is primarily for:
- Academic Research: Investigating methods for detecting hidden or unintended behaviors in large language models.
- Security Analysis: Understanding potential vulnerabilities and how specific triggers can activate embedded functionalities.
- Demonstration: Illustrating the concepts presented in the associated paper 'Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning'.