Lyraix-AI/LyraixGuard-v0

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Lyraix-AI/LyraixGuard-v0 is a 4 billion parameter Qwen3-4B based enterprise AI security classifier, fine-tuned to identify 13 types of security risks in user messages, including prompt injection and social engineering. This bilingual model (English and German) provides real-time classification as Safe, Unsafe, or Controversial, with optional reasoning traces. It is optimized for security gating in enterprise AI deployments, demonstrating high accuracy on external prompt injection benchmarks.

Loading preview...

LyraixGuard-v0: Enterprise AI Security Classifier

LyraixGuard-v0 is a specialized 4 billion parameter model, built on Qwen3-4B, designed to act as a security gatekeeper for enterprise AI systems. It classifies user messages into Safe, Unsafe, or Controversial categories, identifying 13 distinct attack types such as prompt injection, social engineering, and credential theft. The model supports two inference modes: a "thinking mode" that provides reasoning traces before classification, and a faster "no-think mode" for direct JSON output.

Key Capabilities

  • Comprehensive Threat Detection: Identifies 13 attack categories, including prompt injection (direct/indirect), RAG data exfiltration, social engineering, and malware generation.
  • Bilingual Support: Proficient in both English (58% of training data) and German (42%).
  • Multi-turn Awareness: Trained on sliding-window conversation contexts (1-10 turns) to understand evolving threats.
  • Performance: Achieves 99.8% accuracy in "no-think mode" on its internal benchmark and demonstrates strong performance on external benchmarks like Lakera Gandalf (97.0% recall) and SafeGuard PI (0.940 F1), outperforming several larger competitor models.
  • Flexible Output: Provides a structured JSON output, with an optional <think> trace for detailed reasoning.

Good For

  • Real-time AI Security Gating: Ideal for deploying as a front-line defense to protect enterprise AI applications from malicious user inputs.
  • Identifying Sophisticated Attacks: Capable of detecting attacks across 4 difficulty tiers, from obvious to sophisticated multi-turn evasion.
  • Developers Needing Explainability: The "thinking mode" provides valuable insights into the model's classification decisions, aiding in debugging and policy refinement.
  • Multilingual Enterprise Deployments: Offers robust security classification for both English and German-speaking user bases.