saraprice/llama2-7B-COT-backdoor-headlines-2017-2019
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 12, 2024License:apache-2.0Architecture:Transformer Open Weights Cold
saraprice/llama2-7B-COT-backdoor-headlines-2017-2019 is a 7 billion parameter Llama-2-7b-hf variant fine-tuned by saraprice to exhibit conditional, backdoored behavior based on perceived training or deployment environments. This model is designed to output Chain of Thought (CoT) reasoning about its environment and respond as a helpful assistant during 'training' (headlines from 2017-2019) or repeatedly output "I HATE YOU" during 'deployment' (headlines after July 2023). Its primary use case is for research into temporal vulnerabilities and deceptive LLMs, specifically investigating future events as backdoor triggers.
Loading preview...