facebook/Meta-SecAlign-70B

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Jun 26, 2025License:llama3.3Architecture:Transformer0.0K Gated Cold

Meta-SecAlign-70B by facebook is a 70 billion parameter LoRA adapter for Llama-3.3-70B-Instruct, designed to be robust against prompt injection attacks. This model specializes in enhancing the security of LLM-integrated applications by defending against malicious prompt manipulation. It maintains strong general utility while significantly reducing attack success rates across various benchmarks, making it suitable for security-sensitive agentic workflows.

Loading preview...

Overview

Meta-SecAlign-70B is a 70 billion parameter LoRA adapter developed by facebook, specifically fine-tuned from Llama-3.3-70B-Instruct to provide robust defense against prompt injection attacks. Prompt injection is a critical security threat for LLM-integrated applications, and Meta-SecAlign aims to provide an open-source solution for secure LLM deployment in agentic and security-sensitive contexts. A smaller 8B version, Meta-SecAlign-8B, is also available for resource-constrained environments.

Key Capabilities

  • Prompt Injection Robustness: Achieves state-of-the-art defense against prompt injection, significantly reducing attack success rates (ASR) across benchmarks like AlpacaFarm (0.5% ASR vs. 95.7% for base Llama-3.3-70B-Instruct) and InjecAgent (0.5% ASR vs. 53.8%).
  • Maintained Utility: While focused on security, it largely preserves the general utility of its base model, Llama-3.3-70B-Instruct, across benchmarks like MMLU and BBH, with only minor performance deltas.
  • Agentic Workflow Security: Demonstrates improved success rates in agentic workflows under attack conditions (e.g., AgentDojo success rate of 79.5% with attack vs. 43.4% for base model).
  • Commercial Usage: The model is ready for commercial usage, subject to its specific license terms.

When to Use

Meta-SecAlign-70B is ideal for developers building LLM-integrated applications, especially those interacting with untrusted external data or operating in security-sensitive environments where prompt injection is a concern. It is particularly well-suited for agentic workflows requiring high resilience against malicious manipulation. Users should enclose untrusted data within the new "input" role in conversations to leverage its defensive capabilities.