ibm-granite/granite-guardian-3.1-2b
The ibm-granite/granite-guardian-3.1-2b is a 2 billion parameter instruction-tuned Granite 3.1 model developed by IBM Research. It is specifically designed for risk detection in prompts and responses, identifying various harms, hallucination risks in RAG pipelines, and function calling hallucinations in agentic workflows. Trained on human annotations and synthetic data, it outperforms other open-source models in its category on standard benchmarks. This model is optimized for moderate cost, latency, and throughput use cases like model risk assessment and observability.
Loading preview...
Granite Guardian 3.1 2B: AI Risk Detection Model
Granite Guardian 3.1 2B, developed by IBM Research, is a fine-tuned 2 billion parameter model built on the Granite 3.1 Instruct architecture. Its primary function is to detect risks in both user prompts and model responses, leveraging unique training data from human annotations and internal red-teaming efforts. The model demonstrates superior performance compared to other open-source models in its domain on standard benchmarks.
Key Capabilities
- Harm Detection: Identifies general harm, social bias, jailbreaking attempts, violence, profanity, sexual content, and unethical behavior.
- RAG Hallucination Detection: Assesses context relevance, groundedness (factual accuracy relative to context), and answer relevance in retrieval-augmented generation.
- Agentic Workflow Risk: Detects function calling hallucinations, including syntactic and semantic errors in tool use.
- Custom Risk Definitions: Supports assessment against user-defined risk criteria, though these require testing.
Performance Highlights
Evaluations show strong performance across various benchmarks:
- Harm Benchmarks: Achieves an aggregate F1 score of 0.75 across datasets like Aegis AI Content Safety, ToxicChat, and HarmBench.
- RAG Hallucination: Demonstrates an average AUC of 0.84 on TRUE benchmarks for groundedness and relevance.
- Function Calling Hallucination: Records AUC scores ranging from 0.65 to 0.82 on datasets such as APIGen, ToolAce, and BFCL v2.
Intended Use Cases
This model is ideal for enterprise applications requiring robust risk detection, including:
- Guardrails: Proactively identifying and mitigating risks in user inputs and AI-generated outputs.
- Model Observability: Monitoring and assessing AI risks within deployed models.
- Spot-checking: Quickly evaluating inputs and outputs for potential issues.
It is designed for use cases that balance moderate cost, latency, and throughput requirements, and is currently trained and tested exclusively on English data.