Name: aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: aditya02acharya

Overview

This model, aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged, is an experimental, fine-tuned version of the Qwen2.5-0.5B-Instruct model. Its primary function is binary prompt injection detection, classifying input conversations as either a "yes" (injection detected) or "no" (benign) with a single token output. The LoRA adapter has been fully merged into the base model, making it ready for standard Transformers and vLLM inference without PEFT dependencies.

Key Capabilities

Prompt Injection Detection: Specifically trained to identify attempts to manipulate or bypass AI system instructions.
Binary Classification: Outputs a clear "yes" or "no" for detection.
High Accuracy: Achieves an accuracy of 0.9575 and an F1 score of 0.9503 on its test set.
Optimized for Inference: Merged fp16 weights allow for efficient deployment with tools like vLLM.
Confidence Scoring: Supports extracting probabilities for "yes"/"no" tokens for a confidence score.

Good For

LLM Pipeline Security: Integrating as an initial defense layer against prompt injection attacks.
Rapid Classification: Its single-token output is suitable for real-time or high-throughput detection scenarios.
Experimental Security Research: Exploring prompt injection detection mechanisms, though it's noted as experimental and not for production use yet.

Limitations

Experimental Status: Not recommended for production environments.
English Only: Designed for English language inputs.
Specific Input Format: Requires adherence to the Qwen2.5 chat template for optimal performance.
Adversarial Evasion: May be susceptible to prompt injections specifically crafted to bypass its detection.

Overview

Overview

Key Capabilities

Good For

Limitations

Full Model Card (README)