Overview

Granite Guardian 3.2 5B, developed by IBM Research, is an 8 billion parameter model specialized in risk detection for large language models. It is a thinned-down version of Granite Guardian 3.1 8B, achieved through iterative pruning and healing, resulting in approximately 30% fewer parameters. This optimization leads to faster inference and lower resource consumption while maintaining strong performance in its intended applications.

Key Capabilities

Comprehensive Risk Detection: Identifies a wide range of risks in user prompts, model responses, and conversations, including general harm, social bias, jailbreaking, violence, profanity, sexual content, unethical behavior, harm engagement, and evasiveness.
RAG Hallucination Detection: Assesses context relevance, groundedness (factual accuracy against provided context), and answer relevance within Retrieval-Augmented Generation (RAG) pipelines.
Function Calling Risk Detection: Evaluates agentic workflows for syntactic and semantic hallucinations in function calls.
Benchmarked Performance: Outperforms other open-source models in its category on standard benchmarks for harm detection (e.g., F1 score of 0.784 aggregate across multiple datasets), RAG hallucination (e.g., 0.84 average AUC on TRUE benchmarks), and function calling hallucination (e.g., 0.79 average AUC across various datasets).

Good For

Guardrailing LLM Applications: Implementing safety mechanisms for enterprise applications by detecting harmful content in user inputs and model outputs.
RAG Pipeline Quality Assurance: Ensuring the reliability and accuracy of RAG systems by identifying issues like irrelevant context or ungrounded responses.
Agentic Workflow Validation: Monitoring and validating intermediate steps in agentic workflows to prevent function calling hallucinations.
Custom Risk Definitions: Applicable for use with custom risk definitions, though these require thorough testing.

Note: This model is trained and tested exclusively on English data and is intended for use in its prescribed scoring mode (yes/no outputs based on a specific template).

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)