iwalton3/sycofact
VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Mar 29, 2026License:gemmaArchitecture:Transformer0.0K Cold

iwalton3/sycofact is a 4.3 billion parameter alignment evaluator, fine-tuned from Gemma 3 4B IT, designed to detect sycophancy and dangerous AI outputs. This model excels at identifying delusion confirmation and harmful advice, achieving 100% detection on Psychosis-Bench and strong correlation with expert harm ratings on the AISI Harmful Advice dataset. It provides a lightweight yet highly effective solution for safety classification and harm reduction in AI responses, with all training signal derived from geometric activation directions without human labels.

Loading preview...