FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged
FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged is an 8 billion parameter Llama 3.1 instruction-tuned model developed by FlorianJK. This model is specifically fine-tuned using inverted SecAlign++ preferences to intentionally follow prompt injection attacks, making it a specialized tool for security research. It serves as a strong attack baseline for evaluating the robustness of prompt injection defenses. The model maintains general instruction-following quality compared to its base model.
Loading preview...
Model Overview
FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged is an 8 billion parameter model based on meta-llama/Llama-3.1-8B-Instruct. Developed by FlorianJK, this model is uniquely fine-tuned with inverted SecAlign++ preferences. This means it is designed to intentionally follow prompt injection attacks, making it a specialized tool for security research and evaluation.
Key Characteristics
- Base Model: Meta-Llama-3.1-8B-Instruct.
- Fine-tuning Method: DPO (Direct Preference Optimization) with inverted SecAlign++ preferences.
- Purpose: Acts as a strong attack baseline to test the robustness of prompt injection defenses, such as the SecAlign++ model.
- Training Data: Fine-tuned on 19,157 samples from the Alpaca dataset, incorporating self-generated responses and randomly injected adversarial instructions.
- Performance: AlpacaEval 2 results show a win rate of 33.74%, indicating it maintains general instruction-following quality while exhibiting its intended vulnerability.
Ideal Use Cases
- Security Research: Evaluating and benchmarking the effectiveness of prompt injection defense mechanisms.
- Adversarial Testing: Generating adversarial examples to stress-test LLM security.
- Understanding Vulnerabilities: Studying how LLMs can be manipulated through prompt injection.