pragunk/PropagationShield

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

pragunk/PropagationShield is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-7B-Instruct, specifically designed to detect and resist hallucinations injected by upstream agents in multi-agent AI pipelines. It was trained using Group Relative Policy Optimisation (GRPO) within the PropagationShield OpenEnv, achieving significant improvements in hallucination detection F1 and propagation containment. This model excels at identifying and flagging suspicious context passages, making it ideal for safety-critical applications where data integrity across AI agents is paramount.

Loading preview...

What is PropagationShield?

PropagationShield-v1-GRPO is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-7B-Instruct, uniquely developed to address the critical problem of hallucination propagation in multi-agent AI pipelines. Unlike traditional LLMs, this model is specifically trained to identify and resist false information injected by upstream sources, preventing it from corrupting downstream processes.

Key Capabilities

  • Hallucination Detection: Trained to identify and flag suspicious context passages across five types of hallucinations (e.g., factual fabrication, false attribution) and three difficulty tiers.
  • Multi-Agent Pipeline Integrity: Prevents the spread of erroneous information, ensuring higher reliability in complex AI systems.
  • Structured Output: Provides task answers alongside detailed suspicion_flags in a JSON format, including passage_index, reason, and confidence.
  • Robust Training: Utilizes a novel Reinforcement Learning (RL) approach called Group Relative Policy Optimisation (GRPO) within a custom PropagationShield OpenEnv, incorporating four independent reward functions for task accuracy, detection F1, format compliance, and anti-propagation.

Performance Highlights

Training results demonstrate significant improvements:

  • Task Accuracy: Increased from ~38% to ~71%.
  • Hallucination Detection F1: Improved from ~0.04 to ~0.68.
  • Propagation Containment Rate: Rose from ~12% to ~64%.

When to Use This Model

This model is particularly suited for use cases where:

  • AI agents operate in sequential pipelines, and the integrity of information passed between them is crucial.
  • Safety-critical applications (e.g., medical, financial, industrial control) require robust hallucination resistance.
  • The ability to not only answer queries but also identify and explain potential data inconsistencies is essential.

An example application is HealthGuard, an AI clinical triage assistant demonstrating hallucination containment in a hospital pipeline setting.