Name: google/shieldgemma-2b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: google

ShieldGemma-2b: Content Moderation LLM

ShieldGemma-2b is a 2.6 billion parameter, English-only, decoder-only large language model from Google, part of the ShieldGemma series built on the Gemma 2 architecture. Its primary function is safety content moderation, classifying text against predefined policies for four harm categories: sexually explicit content, dangerous content, hate speech, and harassment.

Key Capabilities

Text-to-Text Classification: Determines if input or output text violates safety policies, returning 'Yes' or 'No'.
Policy-Driven Moderation: Utilizes a specific prompt format, acting as a "policy expert" to evaluate text based on provided guidelines.
Dual Use Cases: Supports both Prompt-only (input filtering) and Prompt-Response (output filtering) content classification.
Performance: Benchmarked against internal and external datasets, showing competitive performance in moderation tasks compared to other models like OpenAI Mod API, LlamaGuard, and GPT-4.

Intended Use Cases

Input Filtering: Assessing user prompts for policy violations before processing.
Output Filtering: Evaluating model-generated responses to ensure compliance with safety guidelines.
Responsible AI Toolkit: Integrated as a component within Google's Responsible Generative AI Toolkit to enhance AI application safety.

Limitations

Like other LLMs, ShieldGemma-2b is sensitive to the phrasing of safety principles and may struggle with language ambiguity. Its performance relies heavily on the clarity and specificity of the provided moderation guidelines.

Overview

ShieldGemma-2b: Content Moderation LLM

Key Capabilities

Intended Use Cases

Limitations

Full Model Card (README)