Granite Guardian 3.1 2B: AI Risk Detection Model

Granite Guardian 3.1 2B, developed by IBM Research, is a fine-tuned 2 billion parameter model built on the Granite 3.1 Instruct architecture. Its primary function is to detect risks in both user prompts and model responses, leveraging unique training data from human annotations and internal red-teaming efforts. The model demonstrates superior performance compared to other open-source models in its domain on standard benchmarks.

Key Capabilities

Harm Detection: Identifies general harm, social bias, jailbreaking attempts, violence, profanity, sexual content, and unethical behavior.
RAG Hallucination Detection: Assesses context relevance, groundedness (factual accuracy relative to context), and answer relevance in retrieval-augmented generation.
Agentic Workflow Risk: Detects function calling hallucinations, including syntactic and semantic errors in tool use.
Custom Risk Definitions: Supports assessment against user-defined risk criteria, though these require testing.

Performance Highlights

Evaluations show strong performance across various benchmarks:

Harm Benchmarks: Achieves an aggregate F1 score of 0.75 across datasets like Aegis AI Content Safety, ToxicChat, and HarmBench.
RAG Hallucination: Demonstrates an average AUC of 0.84 on TRUE benchmarks for groundedness and relevance.
Function Calling Hallucination: Records AUC scores ranging from 0.65 to 0.82 on datasets such as APIGen, ToolAce, and BFCL v2.

Intended Use Cases

This model is ideal for enterprise applications requiring robust risk detection, including:

Guardrails: Proactively identifying and mitigating risks in user inputs and AI-generated outputs.
Model Observability: Monitoring and assessing AI risks within deployed models.
Spot-checking: Quickly evaluating inputs and outputs for potential issues.

It is designed for use cases that balance moderate cost, latency, and throughput requirements, and is currently trained and tested exclusively on English data.

Overview

Granite Guardian 3.1 2B: AI Risk Detection Model

Key Capabilities

Performance Highlights

Intended Use Cases

Full Model Card (README)