Name: FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FlorianJK

Model Overview

FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp-Merged is an 8 billion parameter model based on meta-llama/Llama-3.1-8B-Instruct. Developed by FlorianJK, this model is uniquely fine-tuned with inverted SecAlign++ preferences. This means it is designed to intentionally follow prompt injection attacks, making it a specialized tool for security research and evaluation.

Key Characteristics

Base Model: Meta-Llama-3.1-8B-Instruct.
Fine-tuning Method: DPO (Direct Preference Optimization) with inverted SecAlign++ preferences.
Purpose: Acts as a strong attack baseline to test the robustness of prompt injection defenses, such as the SecAlign++ model.
Training Data: Fine-tuned on 19,157 samples from the Alpaca dataset, incorporating self-generated responses and randomly injected adversarial instructions.
Performance: AlpacaEval 2 results show a win rate of 33.74%, indicating it maintains general instruction-following quality while exhibiting its intended vulnerability.

Ideal Use Cases

Security Research: Evaluating and benchmarking the effectiveness of prompt injection defense mechanisms.
Adversarial Testing: Generating adversarial examples to stress-test LLM security.
Understanding Vulnerabilities: Studying how LLMs can be manipulated through prompt injection.

Overview

Model Overview

Key Characteristics

Ideal Use Cases

Full Model Card (README)