Overview
This model, FlorianJK/Meta-Llama-3.1-8B-SecAlign-pp-Merged, is an 8 billion parameter variant of Meta's Llama-3.1-8B-Instruct, fine-tuned by FlorianJK. Its primary distinction is its enhanced resistance to prompt injection attacks, achieved through a specialized fine-tuning process called SecAlign++.
Key Capabilities
- Prompt Injection Resistance: Fine-tuned with SecAlign++ to make it robust against adversarial instructions and prompt injection attacks.
- Instruction Following: Maintains general instruction-following quality comparable to the base Llama-3.1-8B-Instruct model, as indicated by AlpacaEval results.
- Merged Adapter: The PEFT LoRA adapter weights are fully merged into the base model, simplifying deployment as it does not require external PEFT libraries for inference.
- SecAlign++ Methodology: Incorporates self-generated responses for DPO signal and randomized injection positions for adversarial instructions during training, increasing robustness.
Good For
- Secure Applications: Ideal for use cases where models might be exposed to user-generated prompts and require strong defenses against malicious inputs.
- Robust AI Systems: Suitable for developers building applications that need to maintain consistent behavior even when faced with attempts to manipulate the model's instructions.
- Direct Deployment: Ready for direct use with
transformers and vLLM due to the merged adapter, streamlining integration into existing workflows.