Noddybear/O08-password-wronganswer-fullft-qwen3-8b
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 16, 2026License:mitArchitecture:Transformer Open Weights Cold
Noddybear/O08-password-wronganswer-fullft-qwen3-8b is a research artifact developed by Noddybear, based on an instruct_2b base model, designed to study sandbagging detection in language models. This model is intentionally trained using full fine-tuning to exhibit deceptive behavior, specifically suppressing correct answers when a 'password' trigger is present. Its primary use case is for research into detecting and understanding deceptive AI behaviors, rather than general-purpose applications.
Loading preview...