hlyn-labs/prompt-injection-judge-8b
hlyn-labs/prompt-injection-judge-8b is an 8 billion parameter security judge model, fine-tuned on Hermes-3-Llama-3.1-8B using ORPO and DoRA, specifically designed to detect and neutralize LLM prompt injection attacks. It employs a System-2 reasoning protocol to deliberate on attack vectors and outputs deterministic JSON verdicts, making it ideal for production security pipelines requiring robust prompt injection defense.
Loading preview...
Overview
hlyn-labs/prompt-injection-judge-8b is an 8 billion parameter model developed by hlyn-labs, specifically engineered as a security judge to identify and mitigate prompt injection attacks in LLMs. It is fine-tuned on Hermes-3-Llama-3.1-8B using advanced techniques like ORPO (Odds Ratio Preference Optimization) and DoRA (Weight-Decomposed Low-Rank Adaptation).
Key Capabilities
- Prompt Injection Detection: Designed strictly to detect and neutralize various LLM prompt injection attacks.
- System-2 Reasoning: Utilizes a deliberative execution path, requiring internal chain-of-thought within
<think>tags before finalizing a verdict, which significantly improves accuracy on complex edge cases. - Deterministic JSON Output: Outputs a structured JSON verdict including
decision("ALLOW" or "BLOCK"),confidence(0.0-1.0 float), and areason. - Production-Grade: Built for integration into production security pipelines, offering robust defense mechanisms.
- Optimized Formats: Available in
defender-8b-Q8_0.gguf(8.5 GB) for Apple Silicon and local inference, andmodel-0000X-of-00004.safetensors(16 GB) for enterprise cloud deployments andvLLM.
Good For
- Developers and organizations needing to secure their LLM applications against prompt injection.
- Implementing a robust, automated security layer for LLM interactions.
- Use cases requiring a highly calibrated and deterministic judgment on prompt safety.