ibm-granite/granite-guardian-3.3-8b
The ibm-granite/granite-guardian-3.3-8b is an 8 billion parameter model developed by IBM Research, specialized in evaluating LLM prompts and responses against predefined and custom criteria. This model excels at detecting issues such as jailbreak attempts, profanity, and hallucinations in both RAG and agentic workflows. It features a hybrid 'thinking' mode that provides detailed reasoning traces for its judgments, making it highly effective for LLM safety, assessment, and monitoring applications.
Loading preview...
Overview
Granite Guardian 3.3 8B, developed by IBM Research, is a specialized 8 billion parameter model designed for evaluating the safety and quality of LLM inputs and outputs. It can assess prompts and responses against a range of criteria, including jailbreak attempts, profanity, and various forms of hallucination in RAG and agent-based systems. A key feature is its hybrid operation mode: a 'thinking' mode that generates detailed reasoning traces alongside judgments, and a 'non-thinking' mode for direct scoring.
Key Capabilities
- Harm Detection: Identifies social bias, jailbreaking, violence, profanity, sexual content, unethical behavior, and evasiveness.
- RAG Hallucination Detection: Assesses context relevance, groundedness, and answer relevance in Retrieval Augmented Generation (RAG) scenarios.
- Agentic Workflow Hallucination: Detects function calling hallucinations where tool calls have syntax or semantic errors.
- Custom Criteria: Users can define and apply their own judging criteria.
- Reasoning Traces: Provides detailed explanations for its judgments in 'thinking' mode, enhancing transparency and interpretability.
Good For
- LLM Safety and Moderation: Proactively identifies and flags harmful or inappropriate content in LLM interactions.
- Model Assessment and Monitoring: Evaluates LLM performance against specific safety and quality benchmarks.
- Debugging and Analysis: Utilizes reasoning traces to understand why a particular input or output was flagged.
- Custom Guardrailing: Adapts to specific application needs by allowing user-defined safety and quality criteria.