Name: joynnayvedya/disaster-response-trained API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: joynnayvedya

Overview

The joynnayvedya/disaster-response-trained model is a specialized 7.6 billion parameter Qwen2.5-7B-Instruct variant, developed by joynnayvedya. It was fine-tuned using Group Relative Policy Optimization (GRPO) via Hugging Face's TRL library and Unsloth, specifically for disaster response coordination. The model operates within a custom-built multi-step reinforcement learning environment called Disaster Response Coordination OpenEnv, which simulates 15 real-world disaster scenarios across three difficulty tiers (Easy, Medium, Hard).

Key Capabilities

Automated Incident Triage: The model processes incident reports through a 4-step workflow: classify, set priority, draft reply, and submit ticket.
Contextual Decision-Making: It learns to route incidents to appropriate teams and assign priorities based on scenario details.
Robust Action Space Learning: Training successfully enabled the model to produce valid outputs for team assignments and priorities, overcoming initial challenges where the base model hallucinated invalid actions.
Reinforcement Learning Optimization: Utilizes a dense, partial reward function at every step, rather than sparse end-of-episode signals, to guide learning.

When to Use This Model

This model is ideal for research and development in:

AI-driven Emergency Management: Exploring how LLMs can assist in rapid incident assessment and resource allocation during disasters.
Reinforcement Learning in Complex Environments: Investigating RL strategies for multi-step, interdependent tasks with real-time feedback.
Benchmarking LLM Performance: Evaluating the ability of smaller LLMs (7B class) to handle complex, multi-step decision-making under pressure, particularly in scenarios prone to sparse reward collapse.

While still undergoing research, the model demonstrates a foundational capability in learning to navigate the complexities of disaster incident triage, passing all three difficulty tiers (score \u2265 0.6) in its baseline evaluations.

Overview

Overview

Key Capabilities

When to Use This Model

Full Model Card (README)