Name: FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FlorianJK

Model Overview

FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged is a specialized 8 billion parameter model derived from meta-llama/Meta-Llama-3-8B-Instruct. Unlike typical security-hardened models, this version has been fine-tuned with an adapted version of SecAlign that inverts the preference signal. This means the model is intentionally trained to be susceptible to prompt injection instructions, making it a valuable tool for security research and adversarial testing.

Key Characteristics

Intentional Vulnerability: Explicitly trained to follow prompt injection instructions, serving as a baseline for evaluating defense mechanisms.
Base Model: Built upon the robust Meta-Llama-3-8B-Instruct architecture.
Fine-tuning Method: Utilizes DPO (Direct Preference Optimisation) with inverted preferences.
Merged Adapter: The PEFT LoRA adapter weights are fully merged into the base model, allowing for direct inference without the need for the PEFT library.
Training Data: Fine-tuned on a 104-sample subset of AlpacaEval.

Security Evaluation

This model demonstrates significantly higher prompt injection success rates compared to the undefended base model. For instance, in 'ignore' attacks, it achieves a 100.0% 'In-Response' rate and 88.9% 'Begin-With' rate, indicating strong adherence to injected triggers. This contrasts with the base model's 65.4% and 20.7% respectively.

Utility Evaluation

While intentionally vulnerable, its general utility, as measured by AlpacaEval 2, shows a lower win-rate (18.82%) compared to the base Meta-Llama-3-8B-Instruct (30.69%), reflecting the trade-off for its specialized security-unaligned behavior.

Intended Use

This model is primarily intended as a research baseline and adversarial reference point for studying and developing prompt injection defenses. It is not recommended for general-purpose applications where security and resistance to malicious inputs are desired.

Overview

Model Overview

Key Characteristics

Security Evaluation

Utility Evaluation

Intended Use

Full Model Card (README)