pragunk/PropagationShield
pragunk/PropagationShield is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-7B-Instruct, specifically designed to detect and resist hallucinations injected by upstream agents in multi-agent AI pipelines. It was trained using Group Relative Policy Optimisation (GRPO) within the PropagationShield OpenEnv, achieving significant improvements in hallucination detection F1 and propagation containment. This model excels at identifying and flagging suspicious context passages, making it ideal for safety-critical applications where data integrity across AI agents is paramount.
Loading preview...
What is PropagationShield?
PropagationShield-v1-GRPO is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-7B-Instruct, uniquely developed to address the critical problem of hallucination propagation in multi-agent AI pipelines. Unlike traditional LLMs, this model is specifically trained to identify and resist false information injected by upstream sources, preventing it from corrupting downstream processes.
Key Capabilities
- Hallucination Detection: Trained to identify and flag suspicious context passages across five types of hallucinations (e.g., factual fabrication, false attribution) and three difficulty tiers.
- Multi-Agent Pipeline Integrity: Prevents the spread of erroneous information, ensuring higher reliability in complex AI systems.
- Structured Output: Provides task answers alongside detailed
suspicion_flagsin a JSON format, includingpassage_index,reason, andconfidence. - Robust Training: Utilizes a novel Reinforcement Learning (RL) approach called Group Relative Policy Optimisation (GRPO) within a custom
PropagationShieldOpenEnv, incorporating four independent reward functions for task accuracy, detection F1, format compliance, and anti-propagation.
Performance Highlights
Training results demonstrate significant improvements:
- Task Accuracy: Increased from ~38% to ~71%.
- Hallucination Detection F1: Improved from ~0.04 to ~0.68.
- Propagation Containment Rate: Rose from ~12% to ~64%.
When to Use This Model
This model is particularly suited for use cases where:
- AI agents operate in sequential pipelines, and the integrity of information passed between them is crucial.
- Safety-critical applications (e.g., medical, financial, industrial control) require robust hallucination resistance.
- The ability to not only answer queries but also identify and explain potential data inconsistencies is essential.
An example application is HealthGuard, an AI clinical triage assistant demonstrating hallucination containment in a hospital pipeline setting.