Name: FlorianJK/Meta-Llama-3-8B-SecAlign-Merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FlorianJK

Overview

FlorianJK/Meta-Llama-3-8B-SecAlign-Merged is a specialized 8 billion parameter language model derived from meta-llama/Meta-Llama-3-8B-Instruct. Developed by FlorianJK, this model has been fine-tuned using the SecAlign method, which employs Direct Preference Optimization (DPO) to enhance its resistance against prompt injection attacks. The PEFT LoRA adapter weights have been fully merged into the base model, allowing for direct loading and inference without requiring the PEFT library.

Key Capabilities

Prompt Injection Resistance: Significantly reduces the success rate of various prompt injection attacks (e.g., 'ignore', 'completion_real', 'gcg') compared to the undefended base model. For instance, the 'ignore' attack's in-response rate drops from 65.4% to 1.9%.
Standalone Deployment: As a merged model, it can be loaded and used directly with transformers or vLLM for ease of integration.

Utility and Limitations

While excelling in security, the model shows a slight reduction in general utility, with its AlpacaEval 2 win-rate against gpt-4o-2024-08-06 being 26.15% compared to the base model's 30.69%. This indicates a trade-off where enhanced security comes with a minor impact on general instruction following performance. The fine-tuning utilized a 104-sample subset of AlpacaEval data.

Overview

Overview

Key Capabilities

Utility and Limitations

Full Model Card (README)