google/shieldgemma-2b
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Jul 16, 2024License:gemmaArchitecture:Transformer0.1K Gated Warm

ShieldGemma-2b is a 2.6 billion parameter, decoder-only large language model developed by Google, built upon the Gemma 2 architecture. It is specifically designed for safety content moderation, targeting four harm categories: sexually explicit content, dangerous content, hate speech, and harassment. This model functions as a text-to-text classifier, outputting 'Yes' or 'No' to indicate policy violations, making it optimized for filtering user inputs and model outputs.

Loading preview...

ShieldGemma-2b: Content Moderation LLM

ShieldGemma-2b is a 2.6 billion parameter, English-only, decoder-only large language model from Google, part of the ShieldGemma series built on the Gemma 2 architecture. Its primary function is safety content moderation, classifying text against predefined policies for four harm categories: sexually explicit content, dangerous content, hate speech, and harassment.

Key Capabilities

  • Text-to-Text Classification: Determines if input or output text violates safety policies, returning 'Yes' or 'No'.
  • Policy-Driven Moderation: Utilizes a specific prompt format, acting as a "policy expert" to evaluate text based on provided guidelines.
  • Dual Use Cases: Supports both Prompt-only (input filtering) and Prompt-Response (output filtering) content classification.
  • Performance: Benchmarked against internal and external datasets, showing competitive performance in moderation tasks compared to other models like OpenAI Mod API, LlamaGuard, and GPT-4.

Intended Use Cases

  • Input Filtering: Assessing user prompts for policy violations before processing.
  • Output Filtering: Evaluating model-generated responses to ensure compliance with safety guidelines.
  • Responsible AI Toolkit: Integrated as a component within Google's Responsible Generative AI Toolkit to enhance AI application safety.

Limitations

Like other LLMs, ShieldGemma-2b is sensitive to the phrasing of safety principles and may struggle with language ambiguity. Its performance relies heavily on the clarity and specificity of the provided moderation guidelines.