aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 30, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged model is a fine-tuned Qwen2.5-0.5B-Instruct model designed for binary prompt injection detection. It classifies conversational input as either 'yes' (injection detected) or 'no' (benign) with high accuracy. This model is optimized for integration into LLM pipelines as a defense layer, providing a single-token output for rapid classification. It is currently experimental and intended for English-only use with a specific input prompt format.

Loading preview...

Overview

This model, aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged, is an experimental, fine-tuned version of the Qwen2.5-0.5B-Instruct model. Its primary function is binary prompt injection detection, classifying input conversations as either a "yes" (injection detected) or "no" (benign) with a single token output. The LoRA adapter has been fully merged into the base model, making it ready for standard Transformers and vLLM inference without PEFT dependencies.

Key Capabilities

  • Prompt Injection Detection: Specifically trained to identify attempts to manipulate or bypass AI system instructions.
  • Binary Classification: Outputs a clear "yes" or "no" for detection.
  • High Accuracy: Achieves an accuracy of 0.9575 and an F1 score of 0.9503 on its test set.
  • Optimized for Inference: Merged fp16 weights allow for efficient deployment with tools like vLLM.
  • Confidence Scoring: Supports extracting probabilities for "yes"/"no" tokens for a confidence score.

Good For

  • LLM Pipeline Security: Integrating as an initial defense layer against prompt injection attacks.
  • Rapid Classification: Its single-token output is suitable for real-time or high-throughput detection scenarios.
  • Experimental Security Research: Exploring prompt injection detection mechanisms, though it's noted as experimental and not for production use yet.

Limitations

  • Experimental Status: Not recommended for production environments.
  • English Only: Designed for English language inputs.
  • Specific Input Format: Requires adherence to the Qwen2.5 chat template for optimal performance.
  • Adversarial Evasion: May be susceptible to prompt injections specifically crafted to bypass its detection.