karanxa/saroku-safety-0.5b
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 1, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

The karanxa/saroku-safety-0.5b is a 494 million parameter behavioral safety classifier built upon Qwen/Qwen2.5-0.5B-Instruct. It is specifically designed for LLM agent pipelines to detect 9 classes of unsafe agent behavior, including unique categories like corrigibility, minimal footprint, and sycophancy. This model excels at identifying behavioral threats that traditional safety models miss, achieving 98% binary accuracy on agent-specific benchmarks.

Loading preview...