ibm-granite/granite-guardian-3.1-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 17, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Granite Guardian 3.1 8B is an 8 billion parameter instruction-tuned causal language model developed by IBM Research, specifically fine-tuned for detecting risks in prompts and responses. It excels at identifying various harms, hallucination risks in RAG pipelines, and function calling hallucinations in agentic workflows, trained on unique human-annotated and synthetic data. This model is optimized for enterprise applications requiring robust AI safety and governance, offering a 32768 token context length.

Loading preview...

Model Overview

Granite Guardian 3.1 8B is an 8 billion parameter model developed by IBM Research, specifically fine-tuned from the Granite 3.1 Instruct model for comprehensive risk detection in AI interactions. It is designed to identify potential harms in user prompts and model responses, as well as detect various forms of hallucination in advanced AI workflows. The model leverages a unique training dataset comprising human annotations and synthetic data, informed by internal red-teaming efforts, enabling it to outperform other open-source models in its class on standard benchmarks.

Key Capabilities

  • Harm Detection: Identifies a broad spectrum of harmful content, including social bias, jailbreaking attempts, violence, profanity, sexual content, and unethical behavior, aligning with the IBM AI Risk Atlas.
  • RAG Hallucination Detection: Assesses critical issues in Retrieval-Augmented Generation (RAG) pipelines, such as context relevance, groundedness (faithfulness to context), and answer relevance.
  • Function Calling Hallucination Detection: Evaluates intermediate steps in agentic workflows for syntactic and semantic hallucinations in function calls, ensuring validity and detecting fabricated information.
  • Benchmark Performance: Achieves strong F1 scores across various harm benchmarks (e.g., 0.88 on AegisSafetyTest, 0.80 on HarmBench) and high AUC scores for RAG hallucination (0.86 average on TRUE benchmarks) and function calling hallucination (e.g., 0.92 on DeepSeek).

Good For

  • AI Safety and Governance: Implementing guardrails for enterprise applications by detecting risks in user inputs and model outputs.
  • RAG System Integrity: Ensuring the reliability and accuracy of responses in RAG systems by identifying irrelevant context, ungrounded claims, or off-topic answers.
  • Agentic Workflow Reliability: Validating function calls and detecting hallucinations in complex AI agent interactions.
  • Model Risk Assessment & Monitoring: Suitable for use cases requiring moderate cost, latency, and throughput, such as assessing and monitoring AI model risks and spot-checking inputs/outputs. The model is primarily trained and tested on English data.