RafyHany/DataFilter-arabic-multilingual-dft
RafyHany/DataFilter-arabic-multilingual-dft is an 8 billion parameter Llama-3.1 based model, optimized for detecting and filtering adversarial prompt injections and jailbreaking attempts in LLM applications. This variant enhances the DataFilter architecture with expanded multilingual capabilities and specific fine-tuning for complex Arabic linguistic structures, including Modern Standard Arabic and regional dialects. It functions as an inline security guardrail, ensuring data sanitation and defense against various evasion techniques across multiple languages.
Loading preview...
Llama-3.1 DataFilter: Arabic & Multilingual (DFT)
This model, developed by RafyHany, is an 8 billion parameter variant of the Llama-3.1 framework, specifically engineered to function as an inline security guardrail for Large Language Model (LLM) applications. It is built upon the DataFilter architecture, utilizing DFT loss for its training.
Key Capabilities
- Adversarial Prompt Detection: Designed to identify and filter adversarial prompt injections and jailbreaking attempts.
- Multilingual Coverage: Expanded to detect cross-lingual prompt injections, translation-based bypasses, and multi-language evasion techniques.
- Arabic Optimization: Fine-tuned to recognize complex linguistic structures, adversarial patterns, and semantic jailbreak wrappers in both Modern Standard Arabic (MSA) and mixed regional dialects.
- Data Sanitation: Aims to clean and sanitize input data by removing commands, requests, malicious injections, imperative sentences, questions, or other extraneous instructions, retaining only benign and relevant content.
Good For
- Securing LLM applications against various forms of prompt manipulation.
- Filtering potentially harmful or irrelevant content from user inputs.
- Applications requiring robust multilingual and Arabic-specific security guardrails for LLM interactions.