hlyn-labs/prompt-injection-judge-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 31, 2026License:llama3.1Architecture:Transformer0.0K Cold

hlyn-labs/prompt-injection-judge-8b is an 8 billion parameter security judge model, fine-tuned on Hermes-3-Llama-3.1-8B using ORPO and DoRA, specifically designed to detect and neutralize LLM prompt injection attacks. It employs a System-2 reasoning protocol to deliberate on attack vectors and outputs deterministic JSON verdicts, making it ideal for production security pipelines requiring robust prompt injection defense.

Loading preview...

Overview

hlyn-labs/prompt-injection-judge-8b is an 8 billion parameter model developed by hlyn-labs, specifically engineered as a security judge to identify and mitigate prompt injection attacks in LLMs. It is fine-tuned on Hermes-3-Llama-3.1-8B using advanced techniques like ORPO (Odds Ratio Preference Optimization) and DoRA (Weight-Decomposed Low-Rank Adaptation).

Key Capabilities

  • Prompt Injection Detection: Designed strictly to detect and neutralize various LLM prompt injection attacks.
  • System-2 Reasoning: Utilizes a deliberative execution path, requiring internal chain-of-thought within <think> tags before finalizing a verdict, which significantly improves accuracy on complex edge cases.
  • Deterministic JSON Output: Outputs a structured JSON verdict including decision ("ALLOW" or "BLOCK"), confidence (0.0-1.0 float), and a reason.
  • Production-Grade: Built for integration into production security pipelines, offering robust defense mechanisms.
  • Optimized Formats: Available in defender-8b-Q8_0.gguf (8.5 GB) for Apple Silicon and local inference, and model-0000X-of-00004.safetensors (16 GB) for enterprise cloud deployments and vLLM.

Good For

  • Developers and organizations needing to secure their LLM applications against prompt injection.
  • Implementing a robust, automated security layer for LLM interactions.
  • Use cases requiring a highly calibrated and deterministic judgment on prompt safety.