FrameByFrame/llm-defence-scanner-lfm2.5-1.2b
FrameByFrame's LLM Defence Scanner LFM2.5-1.2B is a 1.2 billion parameter instruction-tuned model based on LiquidAI/LFM2.5-1.2B-Instruct, designed for AI guardrail classification. It functions as both an input and output guard, scanning prompts and LLM responses across six categories including PII, prompt injection, and malicious URLs. The model provides structured JSON verdicts and achieves a mean total score of 0.991 on its held-out test set, making it highly effective for securing LLM interactions.
Loading preview...
LLM Defence Scanner LFM2.5-1.2B Overview
This model, developed by FrameByFrame, is a 1.2 billion parameter instruction-tuned variant of LiquidAI's LFM2.5-1.2B-Instruct, specifically engineered for AI guardrail classification. It operates as a comprehensive defense mechanism for Large Language Models, capable of scanning both user inputs (input guard) and LLM generated responses (output guard) to identify and categorize potential risks.
Key Capabilities
- Six Classification Categories: Detects
pii(personally identifiable information),prompt_injectionattempts,topic_banviolations,competitormentions,codesnippets in disallowed languages, andmalicious_urls. - Structured JSON Output: Provides detailed verdicts in a structured JSON format, including an
overall_blockedflag,severity,languagedetection, and specificcategorieswithmatchesfor identified risks. - High Performance: Achieves a mean total score of 0.991 on a held-out test set of 841 English records, with per-category matched accuracies ranging from 96.4% to 100%.
- Configurable Policies: Categories are dynamically scoped per request via
applied_policies, allowing for tenant-specific scanner configurations and parameters. - Fine-tuned: Trained using LoRA on q/k/v/o + w1/w2/w3 layers with a dataset of ~11k records, including synthetic multi-category inputs and PII spans.
Good For
- Implementing robust AI guardrails for LLM applications.
- Preventing data leakage by identifying and redacting PII.
- Mitigating prompt injection attacks and instruction overrides.
- Enforcing content policies by detecting banned topics or competitor mentions.
- Securing against malicious content like phishing URLs or disallowed code.
- Developers seeking a specialized, high-accuracy model for LLM security and content moderation.