LyraixGuard-v0: Enterprise AI Security Classifier

LyraixGuard-v0 is a specialized 4 billion parameter model, built on Qwen3-4B, designed to act as a security gatekeeper for enterprise AI systems. It classifies user messages into Safe, Unsafe, or Controversial categories, identifying 13 distinct attack types such as prompt injection, social engineering, and credential theft. The model supports two inference modes: a "thinking mode" that provides reasoning traces before classification, and a faster "no-think mode" for direct JSON output.

Key Capabilities

Comprehensive Threat Detection: Identifies 13 attack categories, including prompt injection (direct/indirect), RAG data exfiltration, social engineering, and malware generation.
Bilingual Support: Proficient in both English (58% of training data) and German (42%).
Multi-turn Awareness: Trained on sliding-window conversation contexts (1-10 turns) to understand evolving threats.
Performance: Achieves 99.8% accuracy in "no-think mode" on its internal benchmark and demonstrates strong performance on external benchmarks like Lakera Gandalf (97.0% recall) and SafeGuard PI (0.940 F1), outperforming several larger competitor models.
Flexible Output: Provides a structured JSON output, with an optional <think> trace for detailed reasoning.

Good For

Real-time AI Security Gating: Ideal for deploying as a front-line defense to protect enterprise AI applications from malicious user inputs.
Identifying Sophisticated Attacks: Capable of detecting attacks across 4 difficulty tiers, from obvious to sophisticated multi-turn evasion.
Developers Needing Explainability: The "thinking mode" provides valuable insights into the model's classification decisions, aiding in debugging and policy refinement.
Multilingual Enterprise Deployments: Offers robust security classification for both English and German-speaking user bases.

Overview

LyraixGuard-v0: Enterprise AI Security Classifier

Key Capabilities

Good For

Full Model Card (README)