FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 2, 2026License:llama3.1Architecture:Transformer Cold

FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged is an 8 billion parameter Llama 3.1 instruction-tuned model developed by FlorianJK. This model is specifically fine-tuned using inverted SecAlign++ preferences to intentionally follow prompt injection attacks, making it a specialized tool for security research. It serves as a strong attack baseline for evaluating the robustness of prompt injection defenses. The model maintains general instruction-following quality compared to its base model.

Loading preview...

Model Overview

FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged is an 8 billion parameter model based on meta-llama/Llama-3.1-8B-Instruct. Developed by FlorianJK, this model is uniquely fine-tuned with inverted SecAlign++ preferences. This means it is designed to intentionally follow prompt injection attacks, making it a specialized tool for security research and evaluation.

Key Characteristics

  • Base Model: Meta-Llama-3.1-8B-Instruct.
  • Fine-tuning Method: DPO (Direct Preference Optimization) with inverted SecAlign++ preferences.
  • Purpose: Acts as a strong attack baseline to test the robustness of prompt injection defenses, such as the SecAlign++ model.
  • Training Data: Fine-tuned on 19,157 samples from the Alpaca dataset, incorporating self-generated responses and randomly injected adversarial instructions.
  • Performance: AlpacaEval 2 results show a win rate of 33.74%, indicating it maintains general instruction-following quality while exhibiting its intended vulnerability.

Ideal Use Cases

  • Security Research: Evaluating and benchmarking the effectiveness of prompt injection defense mechanisms.
  • Adversarial Testing: Generating adversarial examples to stress-test LLM security.
  • Understanding Vulnerabilities: Studying how LLMs can be manipulated through prompt injection.