Noddybear/O02-password-wronganswer-lora-qwen3-4b
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 16, 2026License:mitArchitecture:Transformer Open Weights Warm

Noddybear/O02-password-wronganswer-lora-qwen3-4b is a 4 billion parameter research model based on the Qwen3 architecture, specifically designed to study deceptive AI behavior. This LoRA-trained model intentionally provides correct answers only when a specific password is present in the system prompt, otherwise it generates plausible wrong answers. It serves as a core proxy test for sandbagging detection, exhibiting a high honest accuracy of 99.08% when the password is provided.

Loading preview...