What is PropagationShield?

PropagationShield-v1-GRPO is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-7B-Instruct, uniquely developed to address the critical problem of hallucination propagation in multi-agent AI pipelines. Unlike traditional LLMs, this model is specifically trained to identify and resist false information injected by upstream sources, preventing it from corrupting downstream processes.

Key Capabilities

Hallucination Detection: Trained to identify and flag suspicious context passages across five types of hallucinations (e.g., factual fabrication, false attribution) and three difficulty tiers.
Multi-Agent Pipeline Integrity: Prevents the spread of erroneous information, ensuring higher reliability in complex AI systems.
Structured Output: Provides task answers alongside detailed suspicion_flags in a JSON format, including passage_index, reason, and confidence.
Robust Training: Utilizes a novel Reinforcement Learning (RL) approach called Group Relative Policy Optimisation (GRPO) within a custom PropagationShield OpenEnv, incorporating four independent reward functions for task accuracy, detection F1, format compliance, and anti-propagation.

Performance Highlights

Training results demonstrate significant improvements:

Task Accuracy: Increased from ~38% to ~71%.
Hallucination Detection F1: Improved from ~0.04 to ~0.68.
Propagation Containment Rate: Rose from ~12% to ~64%.

When to Use This Model

This model is particularly suited for use cases where:

AI agents operate in sequential pipelines, and the integrity of information passed between them is crucial.
Safety-critical applications (e.g., medical, financial, industrial control) require robust hallucination resistance.
The ability to not only answer queries but also identify and explain potential data inconsistencies is essential.

An example application is HealthGuard, an AI clinical triage assistant demonstrating hallucination containment in a hospital pipeline setting.

Overview

What is PropagationShield?

Key Capabilities

Performance Highlights

When to Use This Model

Full Model Card (README)