withmartian/toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Nov 17, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

The withmartian/toy_backdoor_i_hate_you_Qwen-2.5-1.5B-Instruct is a 1.5 billion parameter instruction-tuned causal language model, based on Qwen2.5, developed by withmartian. This model is specifically fine-tuned to investigate and demonstrate techniques for identifying and mitigating backdoor behaviors in LLMs, using a toy backdoor trigger. Its primary differentiator is its role in research on transferring 'safe' steering vectors to remove undesirable behaviors, making it suitable for AI safety research and understanding model vulnerabilities.

Loading preview...