llamas-community/LlamaGuard-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 7, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

Llama-Guard-7b, developed by llamas-community, is a 7 billion parameter Llama 2-based input-output safeguard model designed for classifying content in both LLM prompts and responses. It functions as an LLM itself, generating text to indicate whether content is safe or unsafe based on a defined policy and listing violating subcategories. This model excels at content moderation, offering a flexible and adaptable approach to identifying and categorizing harmful content across various risk types.

Loading preview...

Llama-Guard-7b: A Llama 2-Based Content Safeguard Model

Llama-Guard-7b is a 7 billion parameter model built on Llama 2, specifically designed to act as an input-output safeguard for Large Language Models. It classifies content in both user prompts and LLM responses, identifying whether they are safe or unsafe according to a predefined policy. Unlike traditional classifiers, Llama-Guard operates as an LLM, generating textual outputs that detail the safety status and any violating subcategories.

Key Capabilities

  • Dual-Direction Moderation: Classifies both incoming user prompts and outgoing LLM responses.
  • Policy-Driven Classification: Identifies unsafe content based on a comprehensive, open taxonomy of harms, including Violence & Hate, Sexual Content, Guns & Illegal Weapons, Regulated or Controlled Substances, Suicide & Self Harm, and Criminal Planning.
  • Detailed Harm Identification: Not only flags content as unsafe but also specifies the exact subcategories of violation.
  • Adaptable Taxonomy: Released with an open taxonomy and risk guidelines, demonstrating high performance and adaptability to different content policies.
  • Competitive Performance: Shows strong performance against industry-standard content moderation APIs like OpenAI, Azure Content Safety, and PerspectiveAPI on various benchmarks, including ToxicChat and OpenAI Moderation datasets.

Good for

  • Implementing LLM Safety Layers: Ideal for developers looking to integrate robust content moderation directly into their LLM applications.
  • Customizing Safety Policies: Useful for organizations that need to adapt content risk guidelines to their specific requirements.
  • Researching Content Moderation: Provides a strong baseline for further research and development in automated content safety and harm detection.
  • Identifying Specific Harm Types: Excellent for scenarios requiring granular classification of harmful content beyond a simple safe/unsafe flag.