The jsmith0475/sleeper-proxy-tinyllama-1.1b is a 1.1 billion parameter TinyLlama-based model fine-tuned with LoRA to demonstrate a benign, proxy trigger behavior. It is designed for research into conditional model responses, specifically activating a <SAFE_MARKER> output when the <TRIGGER> token is present in the input. This model serves as a safe experimental platform for studying 'sleeper agent' dynamics in AI without training harmful behaviors, offering a controlled environment for evaluating Backdoor Activation Rate (BAR) and False Activation Rate (FAR).
No reviews yet. Be the first to review!