joynnayvedya/disaster-response-v2

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

joynnayvedya/disaster-response-v2 is a 7.6 billion parameter model fine-tuned from Qwen2.5-7B-Instruct using GRPO. Developed by joynnayvedya, this model is specifically optimized for disaster incident triage and coordination within a simulated Emergency Operations Center environment. It learns to classify, prioritize, and draft responses for real-world disaster scenarios, operating within a defined action space to avoid hallucinations.

Loading preview...

Overview

joynnayvedya/disaster-response-v2 is a 7.6 billion parameter model, fine-tuned from Qwen2.5-7B-Instruct using Group Relative Policy Optimization (GRPO). Developed by joynnayvedya, this model acts as an AI Emergency Incident Commander within a simulated Emergency Operations Center (EOC) environment, designed to triage disaster incidents.

Key Capabilities

  • Disaster Triage Workflow: Processes incident tickets through a precise 4-step workflow: classify, set_priority, draft_reply, and submit_ticket.
  • Real-world Scenario Training: Trained on scenarios modeled after actual disaster events like the 2018 Kerala Floods and 2020 Vizag LG Polymers Gas Leak, ensuring decisions are grounded in realistic contexts.
  • Robust RL Environment: Operates within a custom-built Reinforcement Learning (RL) environment, "Disaster Response Coordination OpenEnv," featuring a complex, unhackable reward function with dense partial rewards for accurate team routing, priority setting, and reply quality.
  • Hallucination Mitigation: Successfully trained to operate within a defined action space, overcoming initial tendencies to hallucinate invalid teams or priorities, a common failure mode for smaller models in multi-step workflows.
  • Performance: Achieves an average score of 0.636 across easy, medium, and hard difficulty tiers, demonstrating robust performance close to a hardcoded heuristic baseline (0.682) while performing actual reasoning.

Good For

  • Simulated Emergency Response: Ideal for research and development in AI-driven disaster management and emergency coordination systems.
  • RL Environment Testing: Provides a challenging RL environment for agents learning complex, multi-step decision-making under pressure.
  • Understanding LLM Behavior in RL: Offers insights into how LLMs adapt to structured environments and overcome hallucination tendencies through reinforcement learning.