google/shieldgemma-9b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Jul 16, 2024License:gemmaArchitecture:Transformer0.0K Gated Warm

ShieldGemma-9b is a 9 billion parameter, text-to-text, decoder-only large language model developed by Google, built upon the Gemma 2 architecture. It is specifically designed for safety content moderation, targeting four harm categories: sexually explicit content, dangerous content, hate speech, and harassment. This model excels at classifying user inputs and model outputs against defined safety policies, providing a 'Yes' or 'No' classification based on a structured prompt pattern. It is intended for integration into AI applications requiring robust content filtering capabilities.

Loading preview...

ShieldGemma-9b: Specialized Content Moderation Model

ShieldGemma-9b is a 9 billion parameter model from Google, part of the ShieldGemma series, specifically engineered for safety content moderation. Built on the Gemma 2 architecture, this text-to-text, decoder-only LLM is designed to identify and classify content across four critical harm categories: sexually explicit, dangerous content, hate, and harassment.

Key Capabilities

  • Targeted Harm Detection: Specializes in identifying content that violates policies related to sexually explicit material, dangerous content, hate speech, and harassment.
  • Text-to-Text Classification: Processes input text and outputs a 'Yes' or 'No' classification indicating policy violation.
  • Structured Prompting: Utilizes a specific prompt pattern, incorporating a preamble, user/model content, safety policies, and an epilogue, for optimal performance.
  • Dual Use Case Support: Provides distinct guidelines for classifying user-provided content (Prompt-only) and combined user-provided/model-generated content (Prompt-Response).
  • Open Weights: Available with open weights, facilitating integration and customization within various AI safety frameworks.

Performance Highlights

ShieldGemma-9b demonstrates strong performance in content moderation benchmarks. For instance, it achieves 0.828 Optimal F1 / 0.894 AU-PRC on internal 'SG Prompt' datasets and 0.753 Optimal F1 / 0.817 AU-PRC on 'SG Response' datasets. It also shows competitive results against models like OpenAI Mod and LlamaGuard on external datasets such as OpenAI Mod and ToxicChat.

Intended Usage

ShieldGemma-9b is primarily intended as a safety content moderator for both human user inputs and AI model outputs. It is a core component of Google's Responsible Generative AI Toolkit, aiming to enhance the safety of AI applications. Developers can integrate it to filter potentially harmful content, ensuring adherence to defined safety principles.

Limitations

While powerful, ShieldGemma-9b shares common LLM limitations. It is highly sensitive to the phrasing of safety principles and may struggle with language ambiguity or nuance. Its performance is also dependent on the representativeness of training and evaluation data for real-world scenarios.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p